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J0rgensen  provided  a  comprehensive  frequentist  analysis  of  a  general  class  of  prob-  , 
ability  models  which  he  referred  to  as  dispersion  models.  The  present  work  provides 
a  Bayesian  analysis  of  two-parameter  dispersion  models  under  noninformative  priors. 
First  Jeffreys'  and  reference  priors  are  discussed.  The  propriety  of  posteriors  under 
these  priors  is  investigated  for  several  members  of  the  dispersion  family.  Next,  the 
notions  of  first  and  second  order  probability  matching  criteria  are  introduced,  and  dif- 
ferent priors  are  compared  according  to  these  criteria.  In  the  process,  some  new  priors 
are  found  which  are  different  from  either  Jeffreys'  or  reference  priors.  The  results  are 
illustrated  with  both  real  and  simulated  data. 
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CHAPTER  1 
INTRODUCTION 


This  chapter  contains  a  gentle  introduction  to  the  Bayesian  method,  a  literature 
review  of  pertinent  articles  and  books  related  to  the  topics  of  the  dissertation,  and 
an  overview  of  the  dissertation. 

1.1    A  Bayesian  Primer 

Bayesian  analysis  is  perhaps  best  explained  by  contrast  to  frequentist  (or  classical) 
statistical  analysis.  In  both  approaches  we  may  let  9  represent  the  state  of  nature  of 
which  we  have  interest  and  0  all  possible  states  of  nature.  In  the  Baysian  paradigm,  9 
possesses  a  distribution  which  quantifies  prior  beliefs  about  how  likely  9  is  to  assume 
a  value  of  6.  In  the  frequentist  paradigm,  ^  is  a  fixed  unknown. 

In  both  paradigms  a  random  variable,  Y ,  has  values  related  to  the  possible  out- 
comes of  an  experiment.  Experiments  are  designed  so  that  observations  are  dis- 
tributed according  to  some  probability  distribution  which  depends  on  the  unknown 
9.  For  a  Bayesian,  experimentation  provides  additional  information  about  9,  while 
for  a  frequentist,  experimentation  provides  the  sole  information  about  9. 

In  a  frequentist  analysis,  inference  is  based  on  the  distribution  of  F.  In  a  Bayesian 
analysis,  inference  is  based  on  the  posiehor  distribution  of  9  conditional  on  the  actual 
value  of  the  observed  data.  The  formulation  of  the  posterior  is  derived  from  Bayes' 
Rule  (1763). 
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To  illustrate  Bayesian  inference  we  will  consider  a  concrete  example  taken  from 
Berger  (1985).  In  this  example  and  throughout  this  work  we  will  not  make  a  no- 
tational  distinction  between  a  random  variable  and  the  value  it  assumes.  Consider 
the  situation  where  a  child  is  given  an  intelligence  test.  Assume  the  test  result  Y  is 
iV(^,  100),  where  9  is  the  true  IQ  of  the  child,  as  measured  by  the  test.  (In  other 
words,  if  the  child  were  to  take  a  large  number  of  independent  similar  tests,  his  av- 
erage score  would  be  about  6.)  Assume  also  that,  in  the  population  as  a  whole,  6  is 
distributed  according  to  a  iV(100, 225)  distribution.  Let  the  prior  of  9  be  denoted  by 
7r(^).  Then  for  this  example 

7r(^)  =  N(100,225). 

We  denote  the  distribution  of  the  random  variable,  Y,  conditional  on  9  by  f{y\9).  In 
this  example 

f{y\9)  =  N  (9,100). 

Denote  the  posterior  distribution  of  9  conditional  on  the  observed  data  by  U.{9\y). 
According  to  Bayes  Rule,  the  posterior  is  related  to  the  prior  and  the  conditional 
distribution  of  Y  by 

U{9\y)  oc  7r{9)f{y\9), 

where  the  constant  of  proportionality  depends  only  on  the  data. 

Now  return  to  the  IQ  example  and  suppose  a  child  scores  115  on  the  test.  Following 
Bayes'  Rule  the  posterior  distribution  of  his  true  IQ,  9,  is  found  from 

1  (0-100)^  1  (115-9)^ 

n(^|115)  a    .  e    2(225)  e    ^(loo)  . 

^27r(225)  y27r(100) 

After  some  algebraic  manipulations  we  observe  that  11  (^|  115)  is  a  normal  density 
with  mean  =  110.39  and  variance  =  69.23.  Thus  the  child's  true  IQ,  9,  has  a 
N(110.39,69.23)  posterior  distribution. 


3 


Bayesian  inference  is  often  reported  in  terms  of  credible  sets.  A  100(1  -  a)% 
credible  set  for  0  is  a  subset  C  of  0  such  that 

l-a<  [  U{9\y)d9. 
Jc 

The  credible  set  for  the  IQ  example  is  graphed  in  Figure  1.1. 


Figure  1.1.  Credible  set  for  the  IQ  example  ?  ^  ;  ^ 


For  a  child  scoring  115  on  his  IQ  test  a  95%  credible  set  for  6  is 

(94.08,126.70). 
The  classical  95%  confidence  interval  is 
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(95.4,134.6). 

Thus  by  using  prior  information  available  about  the  distribution  of  IQ  scores,  the 
Bayesian  method  incorporates  information  arising  from  sources  other  than  the  statis- 
tical investigation.  The  effect  in  this  example  is  to  shrink  the  reported  region  where 
9  is  highly  likely  to  lie. 

To  make  the  case  for  incorporating  prior  information,  Berger  (1985)  cites  Savage's 
(1961)  compelling  example  of  the  possible  importance  of  prior  information.  Consider 
the  following  experiments: 

•  A  lady  claims  to  be  able  to  tell  whether  the  tea  or  the  milk  was  poured  into  the 
cup  first.  In  ten  trials  conducted  to  test  this,  she  makes  a  correct  determination 
each  time. 

•  A  music  expert  claims  to  be  able  to  distinguish  a  page  of  Haydn  score  from  a 
page  of  Mozart  score.  In  ten  trials  conducted  to  test  this,  he  makes  a  correct 
determination  each  time. 

•  A  drunken  friend  says  he  can  predict  the  outcome  of  a  flip  of  a  fair  coin.  In  ten 
trials  conducted  to  test  this,  he  is  correct  each  time. 

A  frequentist  analysis  would  be  identical  in  all  cases.  No  consideration  would  be 
given  to  the  knowledge  that  it  is  an  expert  distinguishing  the  music  scores,  a  drunk 
predicting  the  coin  toss,  or  as  in  the  first  case,  the  prior  information  is  vague. 

To  accommodate  Bayesian  analysis  in  all  situations,  a  prior  density  for  6  is  needed 
which  reflects  vague  or  nonexistent  prior  information.  Such  a  prior  is  referred  to  as 
a  noninformative  prior.  In  the  present  work  we  focus  on  the  use  of  noninformative 
priors.  We  choose  this  focus  because  noninformative  priors  are  often  necessary  to  an 
analysis  and  are  often  difficult  to  justify  in  regard  to  selection  and  validity  of  final 
inference. 
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Noninformative  priors  are  greatly  needed  in  the  case  where  little  or  no  previous 
information  exists.  They  also  provide  a  robustness  check  for  specifying  other  priors. 
There  are  problems  with  noninformative  priors  though.  A  sticky  problem  is  the 
sheer  number  of  noninformative  priors  available  since  there  is  not  a  universally  'best' 
noninformative  prior.  Formulas  are  offered  for  noninformative  priors  based  on  various 
optimality  criteria.  Often  a  noninformative  prior  is  optimal  in  one  sense  but  not 
in  another.  Another  big  problem  is  that  some  noninformative  priors  can  lead  to 
posteriors  which  are  not  proper  densities,  that  is  they  cannot  be  integrated  over  0 
to  yield  a  finite  value.  Bayesian  inference  is  meaningless  with  the  use  of  an  improper 
posterior. 

An  early  choice  of  Laplace  (1812)  of  a  noninformative  prior  was  the  uniform,  or 
flat  prior.  Let's  give  an  example.  Say  Y  is  N(^,100).  Then  a  uniform  prior  for  9 
is  Tr{9)  —  1.  The  posterior,  U{6\y),  is  then  N(y,100).  Note  that  even  though  the 
prior  was  improper,  i.e.  j  i:(6)d6  =  oo,  the  resulting  posterior  is  proper  thus  allowing 
meaningful  inference. 

A  drawback  of  the  uniform  prior  is  its  lack  of  invariance  under  1:1  transformation 
of  parameters  for  many  densities.  With  a  noninformative  prior  which  lacks  invariance, 
final  inference  will  differ  according  to  parameterization.  Efforts  to  derive  noninforma- 
tive priors  through  consideration  of  transformation  of  a  problem  began  with  Jeffreys 
(1961).  Jeffreys'  prior  is  perhaps  the  best  known  type  of  noninformative  prior.  The 
formula  for  finding  Jeffreys'  prior  is  given  by 


Jeffreys'  ^  vl-^(^)| 

where  6  —  {6i,...,6p)  and  a  typical  element  of  I  (9),  the  information  matrix  of  9,  is 
given  by 


m),,  =  -E 


d9id9j 


^og  f{y\9) 
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A  class  of  noninformative  prior  developed  along  a  different  line  of  thought  is  the 
probability  matching  prior.  The  probability  matching  prior  was  introduced  by  Welch 
and  Peers  in  1963  as  a  vehicle  to  reconcile  Bayesian  and  frequentist  inference.  Peers 
(1965)  showed  that  probability  matching  priors  are  solutions  of  certain  differential 
equations.  The  main  idea  of  such  priors  is  that  they  match  Bayesian  credible  sets  with 
frequentist  confidence  intervals  in  the  sense  that  a  frequentist 's  confidence  interval  will 
equal  a  Bayesian's  credible  set  plus  a  small  error.  The  work  of  Mukerjee  and  Dey 
(1993)  gives  conditions  where  this  error  is  of  the  order  of  where  n  is  the  sample 
size.  The  development  of  a  Bayesian  method  which  can  so  closely  approximate  a 
frequentist  confidence  interval  is  a  boon  where  frequentist  theory  does  not  exist  to 
give  the  formulation  for  an  exact  or  approximate  confidence  interval. 

In  1973,  Box  and  Tiao  introduced  an  important  optimality  condition  for  noninfor- 
mative priors.  They  argue  that  noninformative  priors  should  be  shape  preserving  in 
the  sense  that  if  a  likelihood  has  a  location  parameter,  6,  then  a  desirable  property  of 
a  prior  is  that  two  samples  Y  and  Y*  will  produce  posteriors  differing  only  in  location. 
This  criterion  is  referred  to  in  the  literature  as  data  translated  likelihood. 

Bernardo  (1979)  sought  a  class  of  noninformative  priors  which  would  allow  a 
potentially  infinite  amount  of  data  to  supply  the  information  garnered  about  an  un- 
known 9.  Bernardo's  noninformative  priors  have  become  known  as  reference  priors. 
A  neat  formula  is  not  available  for  the  reference  prior.  Its  definition  is  highly  technical 
and  cast  in  a  decision  theoretic  framework.  The  main  idea  of  its  role  is  to  maximize 
the  contribution  the  observed  data. 

A  shortcoming  of  Jeffreys'  prior  is  its  performance  in  the  presence  of  nuisance  pa- 
rameters. In  this  case  Jeffreys'  prior  can  often  yield  estimators  which  are  inconsistent 
in  a  frequentist  sense.  As  an  example,  Berger  and  Bernardo  (1992a)  cite  the  famous 
Neyman-Scott  problem  where  Jeffreys'  prior  leads  to  an  inconsistent  estimator  of  the 
error  variance  in  a  fixed  effects  balanced  one  way  normal  ANOVA  model.  Berger 
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and  Bernardo  (1989)  were  able  to  use  the  reference  prior  approach  to  give  a  form  of 
reference  priors  which  would  perform  better  than  the  Jeffreys'  prior  in  the  presence 
of  nuisance  parameters  while  remaining  invariant  under  one  to  one  transformations 
of  parameters. 

In  this  work  we  will  group  noninformative  priors  into  three  classes:  Jeffreys'  priors, 
reference  priors,  and  probability  matching  priors.  These  classes  are  not  mutually  > 
exclusive;  one  noninformative  prior  may  belong  to  more  than  one  class.  The  groupings 
reflect  the  development  of  noninformative  priors  and  will  provide  a  structure  for  the 
purposes  of  this  work.  These  classes  provide  noninformative  priors  which  are  optimal 
in  the  ways  highlighted  in  Table  1.1.  A  more  in-depth  discussion  of  noninformative 
priors  follows  in  the  literature  review. 

1.2    Literature  Review 

A  well  formulated  posterior  density  is  the  Bayesian's  tool  for  answering  practical 
questions.  Although  theory  can  give  symbolic  life  to  a  posterior  density,  its  usefulness 
depends  on  its  propriety  and  the  appropriateness  of  prior  density  selection.  The  quest 
for  theory  to  provide  conditions  for  propriety  and  guidelines  for  selecting  an  "optimal 
prior"  are  current  areas  of  research. 

When  certain  knowledge  is  available  about  parameters,  priors  can  be  constructed 
to  model  prior  knowledge.  However,  the  need  for  a  noninformative  prior  to  measure 
the  sensitivity  of  final  answers  to  pre-specification  of  beliefs  is  essential.  Such  nonin- 
formative priors  must  be  relied  on  in  cases  where  no  (or  minimal)  prior  information 
is  available. 

Strategies  for  the  elicitation  of  an  optimal  noninformative  prior  abound.  Each 
arises  from  the  desire  that  a  noninformative  prior  exhibit  some  "reasonable"  property. 
Unfortunately,  while  a  specific  prior  may  satisfy  some  of  the  reasonable  properties, 
it  is  sure  to  violate  others.   Berger  (1985,  p.   89)  writes  that  "Perhaps  the  most 
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Table  1.1.  Comparison  of  noninformative  prior  classes 


Jeffreys' 

Reference 

Probability  Matching 

Simple 

r  nrm  1 1 1  a  1 1  nn 

J.  L/i  lllUidtlVJll 

Exists 

always 

Tint 

always 

±11  V  Cll  iclllv^c 

yes 

yes 

yes 

Shape 
preserving 

yes 

yes 

not 
always 

Performs  well 
in  presence  of 

ilUlOClll^^C    L/Cll  OiLUkZ  LCI  o 

not 
always 

yes 

yes 

Maximizes  contribution 
of  data 

yes 

yes 

yes 

Matches  frequentist 
confidence  intervals 
with  Bayesian  credible 
sets 

not 
always 

not 
always 

yes 

embarrassing  feature  of  noninformative  priors  is  simply  that  there  are  often  so  many 
of  them." 

In  spite  of  their  wide  use  in  applications,  a  pitfall  of  noninformative  priors  is  that 
their  use  may  sometimes  lead  to  improper  posteriors.  Ibrahim  and  Laud  (1991)  give 
such  an  example.  Consider  a  gamma  model  with  identity  link  function  and  known 
shape  parameter  a  =  I.  Suppose  there  is  only  one  observation  y  from  this  model 
with  one  covariate  x,  a  corresponding  regression  coefficient     and  no  intercept  term. 
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Assuming  a  uniform  prior  on  P,  the  posterior  density  is  given  by 

p{(3\x,  y)  cx  ^  exp  i-y/Px),       px  >  0  (1.2.1) 
px 

For  any  y  >  0,  this  density  is  not  proper. 

Ibrahim  and  Laud  argue  for  the  use  of  Jeffreys's  prior  when  using  GeneraUzed 
Linear  Models  (GLMs).  One  of  the  features  which  make  Jeffreys'  prior  an  attractive 
noninformative  prior  is  its  invariance  under  one-to-one  reparameterization.  If  is 
Jeffreys's  prior  for  (j)  and  if  C  =  is  a  one-to-one  reparameterization  of  then 
Jeffreys's  prior  for  (  is  po  f~^{()  ^-^^  ■  Jeffreys's  prior  is  also  noninformative  in 
the  sense  of  data  translated  likelihood.  If  a  likelihood  function,  Ly{(j)),  can  be  written 
in  the  form 

Ly{cl>)  =  f{(f>-t{y))  (1.2.2) 

then  a  desirable  property  of  a  posterior  is  that  two  different  samples  y  and  y*  will 
produce  posteriors  that  differ  only  in  respect  to  location.  The  aim  is  that  a  prior  will 
not  result  in  posteriors  with  different  shapes  for  different  samples  if  the  likelihood 
can  be  written  in  a  location  form  in  the  unknown  parameter.  This  is  the  notion  of 
'data  translated  likelihood'  introduced  by  Box  and  Tiao  (1973).  A  modification  of 
this  idea,  approximate  data  translated  likelihood,  extends  the  class  of  likelihoods  which 
satisfy  the  required  form.  Kass  (1990)  has  shown  that  Jeffreys's  prior  will  produce 
the  desired  posteriors  for  likelihoods  which  are  'approximately  data  translated'. 

Jeffreys  prior  has  a  desirable  geometrical  interpretation  as  given  by  Kass  (1989). 
Uniform  measure  on  some  parameter  space  (i.e.,  Lebesgue  measure)  seems,  at  first 
glance,  to  be  good  choice  for  a  noninformative  prior.  It  is  objectionable,  however, 
because  a  prior  that  is  uniform  on  one  parameter  space  is  not  uniform  on  others. 
Whether  a  noninformative  prior  determined  by  a  metric  is  appealing  depends  on 
whether  the  metric  is  appealing.  The  information  metric  (based  on  Fisher  informa- 
tion) is  seemingly  the  natural  choice  from  the  point  of  view  of  asymptotics.  Fisher 
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information  defines  a  metric  which  provides  a  local  measure  of  distance  between  mem- 
bers of  a  family  of  distributions.  Hence  Jeffreys's  prior  is  the  noninformative  prior 
determined  by  the  information  metric. 

Jeffreys's  prior  often  runs  into  problems  in  multiparameter  settings  when  only 
a  subset  of  the  parameter  vector  are  of  inferential  interest  and  the  remaining  are 
nuisance  parameters.  As  mentioned  already,  in  the  Neyman-Scott  problem,  Jeffreys's 
prior  produces  an  inconsistent  Bayes  estimator  (under  squared  error  loss)  of  error 
variance.  In  another  example,  estimating  the  sum  of  squares  of  a  large  number  of 
independent  normal  means  with  a  common  variance  it  leads  to  an  unsatisfactory 
posterior,  often  referred  to  as  Stein's  paradox. 

Bernardo  (1979)  introduced  the  notion  of  a  reference  prior.  The  idea,  for  an  ex- 
periment with  density  f{x  \  9)  and  prior  density  7r(0),  is  to  consider  the  amount  of 
information  about  9  that  the  experiment  can  be  expected  to  provide.  The  reference 
prior  maximizes  the  information  which  exhaustive  experimentation  is  hypothesized  to 
provide.  The  rationale  is  that  the  larger  this  information  is,  the  less  informative  the 
prior.  For  a  variety  of  technical  reasons,  the  reference  prior  is  actually  defined,  not 
for  the  experiment  f{x  \  9),  but  via  an  asymptotic  limit  of  iid  replications  of  the  ex- 
periment. In  situations  where  asymptotic  normality  of  the  posterior  holds,  Bernardo 
showed  that  the  reference  prior  for  9,  provided  there  are  no  nuisance  parameters,  is 
Jeffreys's  prior  7^(9)  =  (|/(^)|)^/^ 

Berger  and  Bernardo  (1989)  expounded  the  reference  prior  approach  for  deriving 
noninformative  priors  in  multiparameter  situations  by  dividing  the  parameter  vec- 
tor into  parameters  of  interest  and  nuisance  parameters.  This  approach  not  only 
eliminates  the  need  for  ad  hoc  modifications  of  Jeffreys's  prior  as  suggested  by  Jef- 
freys himself,  it  also  results  in  Jeffreys's  prior  when  the  whole  parameter  vector  is  of 
interest. 
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A  rough  idea  of  this  approach  is  as  follows:  choose  conditional  distributions  7r(a;  | 
6)  (considering  9  the  parameter  of  interest  and  u  the  nuisance  parameter),  form  the 
marginal  experiment  for  9  by  integrating  out  over  u  with  respect  to  7r(a;  |  9),  and  find 
the  reference  prior  ■k{6)  in  this  marginal  experiment. 

The  idea  was  further  extended  and  generalized  in  a  series  of  articles  by  Berger  and 
Bernardo  (see  for  example  Berger  and  Bernardo  (1992a)),  who  suggested  splitting 
the  parameter  vector  into  multiple  groups  according  to  their  order  of  importance. 
In  another  approach,  J.  K.  Ghosh  suggested  a  noninformative  prior  which  can  be 
thought  of  as  a  complement  of  the  reference  prior.  It  is  the  so-called  reverse  reference 
prior.  The  name  stems  from  the  fact  that  in  its  derivation,  one  follows  the  algorithm 
of  Berger  and  Bernardo  by  simply  pretending  that  the  roles  of  the  parameter  of 
interest  and  the  nuisance  parameter  are  interchanged.  The  reverse-reference  prior 
was  mentioned  in  Berger's  (1992)  discussion  of  Ghosh  and  Mukerjee  (1992). 

Because  of  the  unwieldy  nature  of  the  definition  of  a  reference  prior,  we  look  to 
results  which  handily  give  us  the  form  of  the  reference  prior.  The  work  of  Datta 
and  Ghosh  (1995b)  contains  such  a  result  when  the  Fisher  information  matrix  can  be 
written  in  a  block  diagonal  form  and  when  the  parameter  space  can  be  covered  by  an 
increasing  sequence  of  nested  rectangles. 

Yet  another  criterion  for  selection  of  noninformative  priors,  originally  due  to  Welch 
and  Peers  (1963),  and  Peers  (1965),  and  more  recently  popularized  by  Stein  (1985),  is 
referred  to  as  the  probability  matching  criterion.  Let  yi, . . . ,  ?/„  be  iid  with  common 
pdf  f{y\9,uj),  where  9  is  the  parameter  of  interest,  and  u)  is  the  nuisance  parameter. 
A  prior  '!t{9,<jj)  is  said  to  satisfy  the  first  order  probability  criterion  (FOPMC)  if 

P(^  >  ^i_a(7r,2/i,...,?/„)|^)  =  a  +  0p(n-2),  (1.2.3) 

where  ^i_Q(7r,  . . . ,  i/„)  is  the  upper  ath  percentile  of  the  posterior  distribution  of  9 
under  the  prior  tt.  Roughly  speaking,  (1.2.3)  requires  the  tail  probabilities  of  posterior 
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distributions  to  match  asymptotically  (in  probability)  the  corresponding  frequentist 
coverage  probabilities. 

Peers  (1965)  showed  that  priors  satisfying  the  FOPMC  are  solutions  of  certain 
differential  equations.  A  more  rigorous  and  slightly  more  general  version  of  Peer's 
result  is  given  in  Datta  and  Ghosh  (1995a).  Normally,  however,  there  is  a  wide  class  of 
priors  meeting  the  FOPMC,  and  selection  of  any  particular  member  within  this  class 
seems  difficult.  To  overcome  this  difficulty,  Mukerjee  and  Dey  (1993)  introduced 
the  notion  of  second  order  probability  matching  criterion  (SOPMC)  where  (1.2.3) 

holds  with  Op(n~^)  replaced  by  Op{n~^).  A  prior  meeting  the  FOPMC  also  meets  the 
SOPMC  if  it  is  also  a  solution  of  a  second  differential  equation.  Often  a  prior  meeting 
the  SOPMC  is  unique.  The  details  are  available  in  Mukerjee  and  Dey  (1993). 

Sun  and  Ye  (1996)  characterized  the  noninformative  priors  which  meet  the  SOPMC 
in  the  case  where  the  likelihood  is  a  member  of  a  certain  two-parameter  exponential 
family  of  distributions.  This  family  was  first  introduced  by  Bar-Lev  and  Reiser  (1982) 
and  is  distinguished  by  the  feature  of  admitting  UMPU  (uniformly  most  powerful  un- 
biased) tests  based  on  a  single  test  statistic. 

In  the  discussion  above  three  general  categories  of  noninformative  priors  have 
emerged:  Jeffreys'  prior,  the  reference  prior,  and  probability  matching  priors.  In 
many  situations  the  same  noninformative  prior  will  belong  to  each  category,  but  not 
always.  After  selecting  a  particular  noninformative  prior  the  task  remains  to  check 
the  posterior  distribution  for  propriety.  Ibrahim  and  Laud  (1991)  have  shown  that 
Jeffreys's  prior  can  lead  to  proper  posteriors  in  most  GLMs.  They  give  the  form  for 
a  generalized  linear  model  (GLM)  as 

p(yi|^i,(/.)  =exp{a-i((^)(yi^i-6(ei))  +  c(yi,  (/.)},       i  =  l,...,n.  (1.2.4) 

where  yi, . . . ,  ?/„  are  independent  observations.  This  density  is  parameterized  by  the 
canonical  parameter  6i  and  the  scale  parameter  0.  The  ai(  ),  bi{-),  and  Ci(-),  are 
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known  functions,  and  ai((/»)  is  assumed  to  be  of  the  form  ai{-)  =  (l)/wi,  where  the 
w[s  are  known  prior  weights.  The  O'^s  are  related  to  the  regression  coefficients  by  the 
equation 

9i=6{m),  i-l,...,n,  (1.2.5) 
where  rji  =  XiP,  Xi  —  {xn, . . . ,  Xip)  is  a  1  x  p  vector  denoting  the  ith  row  of  the  n  x  p 
matrix  of  covariates,  X,  /3  =  (^i, . . . ,  Pp)'  is  a  p  x  1  vector  of  regression  coefficients, 
and  ^  is  a  monotonic  differentiable  function  {9  =  °  g~^,  where  g{-)  is  the  usual 
link  function,  and  t  =  b'{-)). 

The  following  two  theorems  are  given  by  Ibrahim  and  Laud  (1991)  which  help 
establish  the  propriety  of  the  posterior  under  Jeffreys'  prior  by  giving  (i)  sufficient 
and  (ii)  necessary  and  sufficient  conditions  for  the  propriety  of  the  posterior  and  prior 
moment  generating  functions,  respectively.  In  their  work  they  assume  that  the  scale 
parameter  (p  is  known. 

Theorem  1.2.1  Suppose  the  likelihood  for  /?  and  Jeffreys  prior  are  as  above.  Assuming 
X  is  of  full  rank  and  the  likelihood  of  ^  is  bounded  above.  Then  a  sufficient  condition 
for  the  existence  of  the  posterior  moment  generating  function  of  P  for  any  GLM  is 
that  the  integral 

l^exY>{Te-\r)  +  r'w{yr-b{r))}  {^^^^  '  dr  (1.2.6) 

is  finite  for  r  in  some  open  neighborhood  about  0.  Here  S  denotes  the  parameter 
space  for  the  canonical  parameter  6. 

Theorem  1.2.2  A  necessary  and  sufficient  condition  for  existence  of  moment  generat- 
ing function  of  Jeffreys's  prior  for  any  GLM  is  that  the  integral 

jfexp(rr-(r))(^y"  (1.2.7) 
is  finite  for  r  in  some  open  neighborhood  about  0. 
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At  the  heart  of  the  proofs  of  the  above  theorems  Ues  a  linear  algebra  formula  for 
determinant  expansions,  the  Cauchy-Binet  Theorem.  Application  of  this  expansion 
formula  allows  for  reduction  of  the  problem  to  checking  a  streamlined  integral  for 
convergence.  The  often  unwieldy  determinant  is  disposed  of. 

With  the  flavor  of  GLMs,  dispersion  models  (introduced  by  J0rgensen  (1992))  are 
of  the  form 

p{y;  //,  A))  =  a(A,  y)  exp{\t{y,  /x)},       y  G  5?  (1.2.8) 

where  a  and  t  are  given  functions,  A  >  0  and  /i  varies  in  an  interval  of  the  real  line. 
The  dispersion  parameter  is  defined  to  be  cr^  =  1/  A  .  The  parameter  /i  is  a  generalized 
location  parameter.  Dispersion  models  may  be  viewed  as  generalizations  of  GLMs 
where  the  exponentiated  function  is  not  required  to  be  linear  in  the  response  vari- 
able. The  subtype  of  dispersion  model  whose  likelihood  has  the  form  of  the  random 
component  of  the  GLM  is  referred  to  as  exponential  dispersion  models  or  EDMs.  In- 
teresting distributions  which  may  be  modeled  as  dispersion  models  include  Student's 
t-distribution,  Laplace's  distribution,  and  the  Fisher-von  Mises  distribution.  The 
Fisher-von  Mises  distribution  is  especially  useful  for  modeling  observations  that  lie 
on  a  circle,  or  more  generally  for  directions.  Mardia  (1979)  contains  a  wealth  of  the- 
ory about  the  Fisher-von  Mises  distribution.  The  inverse  Gaussian  distribution  is  yet 
another  intriguing  member  of  the  dispersion  model  family.  Chhikara  and  Folks  (1989) 
have  unified  the  known  theory  available  about  the  inverse  Gaussian  distribution. 

The  key  elements  of  this  dissertation  have  now  been  introduced.  We  have  three 
general  categories  of  noninformative  priors,  a  rich  class  of  likelihood  functions  in 
the  form  of  J0rgensen  dispersion  models,  and  Ibrahim  and  Laud's  preliminary  result 
establishing  the  propriety  of  certain  posteriors  for  regression  models.  The  challenge 
is  to  unify  the  work  which  has  preceded  and  broaden  the  envelope  of  what  is  known. 
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1.3  Overview 

J0rgensen  (1992)  introduced  a  general  class  of  probability  models  which  he  referred 
to  as  dispersion  models.  This  class  has  a  wide  membership  which  includes  exponential 
dispersion  models  (henceforth  referred  to  as  EDMs),  the  Student's  t-distribution,  the 
power  family  distributions,  and  also  the  Fisher- von  Mises  distribution,  the  last  named 
distribution  being  widely  used  for  the  analysis  of  directional  data.  Also,  as  noted  in 
J0rgensen  (1992),  the  membership  of  the  EDM  is  much  wider  than  just  the  natural 
exponential  family  of  distributions. 

j0rgensen  (1992)  provided  a  comprehensive  frequentist  analysis  of  dispersion  mod- 
els. The  present  work  attempts  instead  a  Bayesian  analysis  of  dispersion  mod- 
els. To  this  end,  we  have  used  certain  noninformative  priors  including  the  widely 
used  Jeffreys'  prior  as  well  as  the  different  reference  priors  of  Berger  and  Bernardo 
(1989,1992a,1992b).  An  introduction  to  noninformative  priors  and  the  Bayesian 
method  is  contained  in  Chapter  1,  along  with  a  review  of  the  literature. 

The  organization  of  the  remaining  sections  is  as  follows.  In  Chapter  2,  J0rgensen's 
dispersion  models  are  defined  and  formulas  are  developed  for  two  classes  of  noninfor- 
mative priors  considered  in  this  work:  Jeffreys'  prior  and  the  reference  prior.  Formu- 
lation for  the  probability  matching  priors  (in  the  case  of  J0rgensen's  dispersion  model 
likelihoods)  is  given  in  Chapter  3. 

In  Chapter  2,  the  two  parameter  exponential  family  of  Bar-Lev  and  Reiser  (1982) 
is  shown  to  be  a  subset  of  the  J0rgensen's  dispersion  model  family.  The  results  of  Sun 
and  Ye  (1996)  for  members  of  the  exponential  family  of  Bar-Lev  and  Reiser  can  thus 
be  viewed  as  special  cases  of  the  general  results  established  for  j0rgensen's  dispersion 
models  in  the  present  work. 

Also  in  Chapter  2,  we  have  proved  the  propriety  of  the  posterior  under  Jeffreys' 
and  reference  priors  for  specific  members  of  the  EDM  family,  namely  the  normal,  the 
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inverse  Gaussian,  and  the  gamma.  Similar  results  are  also  proved  for  the  Fisher-von 
Mises  distribution,  the  power  distribution,  and  the  t-distribution. 

Chapter  3  compares  noninformative  priors  under  a  criterion  of  matching  asymp- 
totically (as  the  sample  size  tends  to  infinity)  the  Bayesian  coverage  probabilities 
based  on  posterior  quantiles  with  the  corresponding  frequentist  probabilities  up  to 
order  Op(n-i/2)  (FOPMC)  and  Op(n-^)  (SOPMC).  A  prior  meeting  the  FOPMC  will 
be  called  a  "first  order  optimal"  prior.  It  will  be  shown  in  Chapter  3  that  for  a 
typical  member  of  the  general  dispersion  family,  reference  priors  are  almost  always 
first  order  optimal,  but  Jeffreys'  priors  sometimes  are  not.  To  select  among  the  first 
order  optimal  priors,  we  bring  in  the  "second  order  optimal",  priors  meeting  the 
SOPMC.  Theorem  3.1.1  provides  characterizations  of  second  order  optimal  priors. 
One  interesting  example  is  the  Fisher-von  Mises  distribution  where  a  new  second 
order  optimal  prior  is  found  different  from  the  Jeffreys'  or  reference  priors.  Another 
new  second  order  optimal  prior  is  found  for  the  Student's  t  distribution.  Theorem 
3.3.1  characterizes  which  EDM  subtypes  have  second  order  optimal  priors. 

Chapter  3  undertakes  a  limited  simulation  work  for  the  lognormal,  inverse  Gaus- 
sian, and  the  Fisher-von  Mises  distribution.  The  calculations  indicate  that  the  unique 
priors  meeting  the  SOPMC  often  meet  the  target  coverage  probability  for  very  small 
sample  sizes.  Also  in  this  chapter,  we  have  used  the  methods  of  the  previous  sections 
for  the  analysis  of  two  real  data  sets. 

In  Chapter  4,  we  develop  a  dispersion  regression  model  and  give  the  form  of 
Jeffreys'  and  reference  priors  for  this  case.  We  extend  the  work  of  Ibrahim  and  Laud 
(1991)  by  giving  a  sufficient  condition  insuring  the  propriety  of  posteriors  formulated 
from  the  dispersion  regression  model.  We  use  also  establish  the  propriety  of  posteriors 
in  the  case  of  the  inverse  Gaussian  and  power  distributions.  Chapter  4  concludes  with 
an  illustration  of  the  theory  via  a  real  data  analysis. 

Chapter  5  contains  a  summary  and  some  ideas  for  further  research. 


CHAPTER  2 

BAYESIAN  INFERENCE  FOR  DISPERSION  MODELS  : 


2.1    Dispersion  Models 

In  this  chapter  we  will  establish  some  general  results  for  dispersion  models  and 
show  for  seven  examples  that  posteriors  under  certain  noninformative  priors  are 
proper. 

2.1.1  Definitions 

j0rgensen  (1992)  introduced  a  general  class  of  probability  models  which  he  referred 
to  as  dispersion  models.  A  density,  /(y|)U,  A),  is  a  dispersion  model  if  it  can  be  written 
as 

f{y\fi,X)  =  a{X,y)e''^y''^^ 

for  some  functions  a(-)  and  t{-).  In  a  monograph  (1996)  J0rgensen  introduces  an 
important  subtype  of  the  dispersion  model,  the  J0rgensen  proper  dispersion  model, 
which  is  of  the  form 

f{y\^i,X)  =  a{X)b{y)e''^y'''\ 

In  all  the  examples  considered  in  this  work,  the  densities  will  be  of  the  J0rgensen 
proper  dispersion  model  form. 

A  density,  f{y\n,  A),  is  a  dispersion  model  with  location  if  it  can  be  written  as 

f{y\ti,X)  =  aiX)e''^y->^\ 
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where  a(A)"^  =  J  e^^^y^y. 

A  special  class  of  dispersion  models,  exponential  dispersion  models  (EDMs),  are 
densities  of  the  form 

f{y\9,X)  =  a{X,y)e'^'y-^^'^y  (2.1.1) 
The  mean,     of  a  EDM,  is  related  to  $,the  canonical  parameter,  by 

H  =  k'{9). 

Following  J0rgensen's  (1992)  notations  we  define  r  =  k',  so  that  9  =  r~^(/i). 
y(/i)  denotes  the  variance  function, 

V{ti)  =  K"{9). 

The  variance  of  Y,  Var(y),  is  a  function  of  A  and  V{ij), 


Var(y) 


A 


J0rgensen  explains  that  in  order  to  distinguish  between  the  random  and  systematic 
component  of  a  generalized  linear  model,  he  employs  the  term  exponential  dispersion 
model,  a  terminology  which  reflects  the  partly  exponential  form  of  (2.1.1)  and  the  im- 
portant role  played  by  the  dispersion  parameter  A.  The  formulation  of  the  dispersion 
model  allows  for  the  inference  technique  of  analysis  of  deviance  which  parallels  the 
analysis  of  variance  for  normal  data.  See  J0rgensen  (1992)  for  more  details. 

2.1.2    Information  Matrices 

.  -  "  V 

For  a  density  in  the  dispersion  model  family, 

^ogf{y\n,X)  =  log  a{X,y)  +  Xt{y,n). 

Since 


0  =  E 


dlog  f{y) 


XE 


dtjy,  n) 
9/i 
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we  have  that 


E 


d' log  f{y) 
dfidX 


=  E 


dt{y,  n) 


So  for  dispersion  models,  the  information  matrix, 


/(/i,A) 


0 


loga{\,y) 


The  orthogonality  between  parameters  fi  and  A  will  be  exploited  in  future  sections 
when  various  noninformative  priors  are  considered. 

In  the  case  of  J0rgensen  proper  dispersion  models  and  dispersion  models  with 
location,  we  know  more  about  the  form  of  I{fi,  A).  We  have 


hocationilJ'y  A)  — 


-XE{t"{y))  0 

logo(A) 


For  an  EDM, 


and 


log  f{y\e,X)  =  loga(A,y)  +  X{9y  -  k{9)), 


d\og  f{y\e,X)        d  log  f{y\6,X)de 


diJ, 


36  dfi 


X{y-K'{e)) 


d/j, 


=  Ky-fJ') 


d_ 


(2.1.2) 


Thus 


-E 


log  f{y) 


^1  ' 
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The  last  equality  holds  because 


dfj. 


dK'{9) 
dB 


Since 


dlog  f{y\9,X)  d\oga{y,X) 


dX 


dx 


+  i9y-K{9)), 


we  have 


-E 


dHog  f{y\9,X) 

dx^ 


=  -E 


\oga{X,y)' 
dX^ 


We  can  then  write  the  information  matrix  for  an  EDM  as 


lEDM 


(/i,A) 


v(n) 


0     -E  [ 


3^  \oga[\,y) 
9A2 


2.1.3    Jeffreys'  and  Reference  Priors 


General  dispersion  model  form  of  priors 


Denote  the  information  matrix  for  the  dispersion  model  by 


/(m,  A)  = 


In  0 

0  /22 


Recall  that  Jeffreys'  prior  is  proportional  to  the  square  root  of  the  determinant  of  the 
information  matrix.  Hence 

I  r      J  |l/2 

"^jeffreys'  |-'ll-'22| 

In  the  presence  of  a  nuisance  parameter,  the  reference  prior  proves  to  be  the  better 
candidate  for  a  noninformative  prior  choice.  Deriving  the  reference  prior  for  any 
given  density  can  be  a  daunting  task.  Here  we  have  the  orthogonality  of  parameters 
rendering  the  information  matrix  in  block  diagonal  form.  This  greatly  simplifies  our 
task.  -  -4  ;  . 
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Theorem  2.1.1  Suppose  /i  or  A  is  considered  the  parameter  of  interest,  while  the  other 
is  regarded  as  a  nuisance  parameter.  Suppose  In,  I22,  can  be  factored  into  functions 
of  jj,  and  A.  Say, 

In   oc   hn{^i)hniX)  (2.1.3) 

I22    oc    /l2l(A)/l22(/i),  (2.1.4) 

where  we  assume  the  /I'^s  >  0.  Assume  that  the  2  dimensional  parameter  space 
with  dimensions  representing  the  //  and  A  directions  can  be  covered  by  rectangular 
compact  sets.  Then 

^re/(/j,A)  OC  TTreverse—ref{ii,X)  OC  h\i'{n)hl{'{X).  (2.1.5) 

When  both  parameters  are  of  interest  we  have  that  that  a  reference  and  a  reverse- 
reference  prior  are  identical  to  Jeffreys'  prior  and  given  by 

T^Jeffreys'  OC  h\{^  {n)h\i^  {X)hli^  {n)h\i^  (X) .  (2.1.6) 

Proof  of  Theorem  2.1.1  Since  the  information  matrix  is  block  diagonal,  the  theorem 
is  proved  by  a  direct  application  of  a  result  due  to  Datta  and  Ghosh  (1995b). 

Remark  2.1.1  We  are  using  the  notation  ref  (/x,  A)  to  emphasize  that  the  prior  arises 
from  the  natural  parameterization  of  the  parameter  space,  i.e.  a  nested  increasing 
sequence  of  rectangles.  These  rectangles  are  such  that  each  side  of  a  rectangle  is 
parallel  to  either  the  /i-axis  or  the  A-axis.  Since  other  reference  priors  may  exist 
under  other  parameterizations,  as  Liseo  (1993)  has  shown,  we  indicate  with  notation 
which  parameterization  has  been  used  for  the  construction  of  the  reference  prior  under 
consideration. 

Form  of  priors  for  dispersion  model  subtypes 

Two  distinct  noninformative  priors  have  emerged  from  our  investigations  thus 
far.  These  are  Jeffreys'  prior  and  the  reference  prior  in  the  presence  of  a  nuisance 
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parameter.  Henceforth  the  notation,  ir Jeffreys'  will  denote  Jeffreys'  prior.  'Kref{^i,x)  will 
denote  the  reference  prior  in  the  presence  of  a  nuisance  parameter.  Assuming  (2.1.3) 
and  (2.1.4)  hold,  we  may  specify  forms  for  Jeffreys'  prior  and  the  reference  prior.  In 
the  case  of  the  general  dispersion  model  •  .  •  . 


T^jeffreys'  OC  y/iu  (/i)/il2(A)/l21  (A)/122(m)  (2.1.7) 

and 


7rre/(^,A)  oc  ■//in (/x)/l21  (A).  (2.1.8) 

In  the  case  of  the  dispersion  model  with  location  and  J0rgensen  proper  dispersion 
models,  /i22(m)  =  1  and  /121(A)  =  -^-^^r^  so 


T^jeffreys'  OC  W /in  (yUj/ll2(Aj  —  


and 


For  EDMs, 


^-dHogajX) 

7rre/(^,A)  OC  W /in  (/^)  


/in  (a*)  =  ^j^^l^    and    /112(A)  =  A, 


so 


T^jeffreys'  OC  ^  :|-^/l21  (A)/l22(/^) 


and 


7rre/(/x,A)  OC 


/^2l(A) 


2.1.4    The  Bar-Lev  and  Reiser  Exponential  Family 
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Bar-Lev  and  Reiser  (1982)  introduced  a  two  parameter  exponential  family  which 
has  the  property  of  admitting  UMPU  tests  for  one  of  the  parameters  based  on  a  single 
test  statistic.  The  definition  of  this  sub-family  is  as  follows.  Let  f{y\9i,92)  be  a  pdf 
of  the  form 

fivK  O2)  =  b{y)  exp  [e^miy)  +  92U2{y)  +  €{61,62)]  (2.1.9) 

Then  /(y  1^1,^2)  belongs  to  the  Bar-Lev  and  Reiser  exponential  family  if  there  exists 
a  differentiable  function  (p  such  that 

02  =  -ei^'{v),  (2.1.10) 

where  77  =  E[u2{y)].  The  following  theorem  establishes  that  the  Bar-Lev  and  Reiser 
exponential  family  is  a  subtype  of  the  J0rgensen  dispersion  model  family. 

Theorem  2.1.2  The  Bar-Lev  and  Reiser  two  parameter  exponential  family  forms  a 
subset  of  the  wider  class  of  J0rgensen  dispersion  models.  'i- 

Proof  of  Theorem  2.1.2  Let  /(y  1^1,^2)  belong  to  the  Bar-Lev  and  Reiser  exponential 
family.  Lemma  3.1  of  Bar-Lev  and  Reiser  (1982)  gives  that 

c(^i,^2)  =  6,^'{rj)  -  9Mv)  -  M{6,),  (2.1.11) 

for  some  function  M{-).  We  proceed  by  showing  that  /(y|^i,02)  can  be  written  as  a 
J0rgensen  dispersion  model.  If  we  let  A  =  ^1  and  =  77,  we  can  rewrite  (2.1.9)  using 
(2.1.10)  and  (2.1.11)  as 

f{y\n,  A)  =  b{y)  exp  [Xui{y)  -  Xip'{^)u2{y)  +  X^'in)  -  Xfi^i)  -  M(X)] 

=   b{y)  exp  [A  {ui{y)  -  ip'ifi)u2{y)  +  <f'{^)  -  ip{^)}  -  M(A)]  (2.1.12) 

Let 

t{y,^i)  =  ui{y)  -  (p'{n)u2{y)  +  ip'in)  -  (^(/x)  (2.1.13) 
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and 

loga(A)  =  -M(A),  (2.1.14) 
to  verify  that  /(y  1^1,^2)  can  be  written  in  dispersion  model  form. 

2.2    Establishing  Propriety  of  Posteriors 

The  capability  for  modeling  real  life  data  with  dispersion  models  is  immense.  All 
EDMs  belong  to  this  family.  Many  other  useful  densities  enjoy  membership.  These 
include  the  Student's  t,  von  Mises,  and  Power  densities.  We  will  especially  consider 
the  normal,  gamma,  inverse  Gaussian,  Fisher-von  Mises,  Student's  t,  and  Power 
densities.  In  this  section  we  will  check  the  posterior  distribution  for  propriety  under 
both  the  Jeffreys'  and  the  reference  prior. 

2.2.1  Normal 

Consider  the  parameterization  of  the  normal  density  given  below 

—00  <  y  <  00,  —00  <  /i  <  00,    A  >  0.  i  •■ 

Writing  f{y\iJ.,  A)  in  dispersion  model  form,  we  associate  with  a(A,  y)  and  —^{y  — 
/i)^  with  t{y,n).  We  proceed  by  calculating  In  and  l22-  Since 

dt{y,  jj) 

—  =  y  - 

and 


we  have 


-E 


=  1. 


Also,  since 


d\oga{\,y) 
dX 


2A' 


and 


aMoga(A,y) 


1 

2A2' 


we  have 


-E 


d'^\oga{X,y) 


1 

2A2' 


Thus  In  =  X  and  I22  =         Hence  =  1,  /112(A)  =  A,  /121(A)  =  ^,  and 

/i22(//)  =  1.  Suppose  Yi, . . .  ,y„  are  independently  and  identically  distributed  with  a 
normal  density.  Let  Yljefjreys'if^)  X\y)  denote  the  posterior  under  Jeffreys'  prior  with 
data  denoted  by  y.  Then 

nje//rej,5'(/^,A|2/)  DC  7rje//reyy(/i,  A)  X  A"/V2i;r=i(!'>-'')'. 

Since  iXjeffTeys'  oc  A~^/^,  we  have 


n,e;/.e,.'(/i,  A|y)  a  A("-i)/2e-tSr=i(^--'^)' 


We  recognize  Y\.jeffreys'[lJ-i  X\y)  as  a  normal  density  in  n,  so  that 


/oo 
'^jeffTeys'{l^,X\y)dn 
-00 


J— 00 

ocA"/2-ie-tE:=,(^i--)\ 


which  is  a  gamma  density  in  A.  The  propriety  of  Ujeffreys'ifJ',  Mv)  is  thus  assured  as 
long  as  all  observations  are  distinct  (otherwise  the  gamma  density  will  be  degenerate). 
The  proof  in  the  case  of  the  reference  prior  is  similar. 

2.2.2  Lognormal 

Consider  the  parameterization  of  the  lognormal  density  given  below 


1     /  A  _A(,„gj^_^)2 


y  >0,    — oo  <  fi  <  oo,    A  >  0. 


Writing  f{y\fj,,X)  in  dispersion  model  form,  we  associate  ^y/^  with  a(A, y)  and 
—  |(logy  —  /i)^  with  t{y,iJ,).  We  proceed  by  calculating  lu  and  l22-  Since 


and 


we  have 


Also,  since 


and 


we  have 


dt{y,n) 
djj. 


logy  -  /X, 


dHjy,  fi) 


-1, 


-E 


dH{y,n) 


1. 


aioga(A,y) 
dX 


loga(A,y) 
5A2 


-E 


aMoga(A,y) 
9A2 


2A' 


1 

2A2' 


1 

2A2' 
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Thus  In  =  A  and  I22  =  2X2-   Hence  =  1,  /ii2(A)  =  A,  /121(A)  =  and 

/i22(/^)  —  1-  The  lognormal  example  follows  as  a  special  case  of  the  normal  example 
by  making  a  log-transformation  of  the  original  variables. 

■  ^ 

2.2.3    Gamma       ,  •    ,  ■  ■         '  '       i  '  • 

Consider  the  gamma  distribution  with  probability  density  function  of  the  form 
/(y|.,A)  =  ^exp-A{^-log^}  (2.2.15) 
with  //  >  0  and  A  >  0.  The  information  matrix  in  this  case  is  given  by 

/(/i,  A)  =  Diag  ^A/.-^  ^  log  r(A)  -  ^  j  .  (2.2.16) 

Thus  /iu(/x)  =  ^,  /112(A)  -  A,  /i2i(A)  =  ^logr(A)  -  {,  and  /i22(At)  =  1-  We  then 
have 

T^jeffreys'  «  A^/^   ^  (  ^  logr(A)  -  -  j  ; 
^ref{ti,X)       ^reverse— reference  I  ^^2         ^('^)       ^1  ' 

Suppose  y  =  Yi, . . .  ,Yn  are  independently  and  identically  distributed  with  a  gamma 
pdf  as  given  in  (2.2.15).  The  likelihood  is  then  given  by 


L{tJ',My)  =  f;;^^  exp 


n 


i=l 


/i 


(2.2.17) 


Let 


7r,,^(/x,  A)  (X  A"/i-i         logr(A)  -  ^ 

and  let  nQ,^(/i,  A|y)  denote  the  posterior  under  this  prior  with  data  denoted  by  y. 
We  will  show  that  the  posterior  is  proper  under  tt  for  appropriate  values  of  a  and 
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p.  The  propriety  of  the  posterior  under  Jeffreys'  and  the  reference  prior  will  then  be 
established  as  special  cases.  We  proceed  by  calculating  the  marginal  posterior  of  IIq,^ 
with  respect  to  A.  First  writing  q  Yl7=i  Vi  and  t  =  EILi  the  joint  posterior  of  n 
and  A  is 

U,Af^,X\y)  DC  A"^+"  (^l«gr(A)  -  r-"(A)9V-^"'^^^exp  (2.2.18) 
Next,  integrating  with  respect  to  /x,  A  has  the  posterior 


n^AMy)  «        f^i^s^^^)  -  x)  r-"(A)g^r(nA)A-"V"^ 


0 


A"(^iogr(A)-^l  r-"(A)9^r(nA)r"^  (2.2.19) 


Now  use  the  relation  given  in  Watson  and  Whittaker  (1958,  p. 250) 


j2  °°  1 

iogr(A)  =  ^ 


and  the  facts  that  r(nA)  =  ^^^^^  and  r(A)  =  to  see  that 


nM(A|v)«A— (i4.g^)'M,v->  ■  (2:2:20) 


Near  zero,  YlaAMy)  behaves  as 


"5  i' 
-.1  • 


A"+"-n^-^+c)^  (2.2.21) 

where  c  is  a  constant.  Thus  if 

n  +  a  -  2^  -  1  >  0,  (2.2.22) 
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Ua,p{My)  is  well  behaved  near  zero. 

Now  we  investigate  the  behavior  of  Ua,p{X\y)  for  large  A.  For  large  A 


(2.2.23) 


By  Stirling's  formula 


r(A)  «  V2^X^ 


(2.2.24) 


for  large  A.  Thus 


r-"(A)r(nA)  «  (27r)(l-")/2„nA;^(n+l)/2 


(2.2.25) 


Since  ri^q  <  t"  we  have  that 


<  n 


(2.2.26) 


hence  for  large  A 


(2.2.27) 


for  some  constant  M  depending  on  y,  but  not  dependent  on  A.  For  large  A,  n^(A|y) 
is  proportional  to  a  gamma  density  and  propriety  is  obtained  when 


For  Jeffreys'  prior  a  =  1/2  and  =  1/2.  For  the  posterior  under  the  reference  prior 
a  =  0  and  P  —  1/2  .  Conditions  (2.2.22)  and  (2.2.28)  imply  that  the  posterior  under 
Jeffreys'  prior  will  converge  if  n  >  1  and  the  posterior  under  the  reference  prior  will 
converge  if  n  >  2.  ,  '  * 

2.2.4    Inverse  Gausssian 

Consider  the  parameterization  of  the  inverse  Gaussian  (IG)  density  given  below 


(2.2.28) 
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y  >  0,    A  >  0,    /i  >  0. 


Writing  A)  in  dispersion  model  form,  we  associate  \f^y  ^^'^  with  a{X,y)  and 

—         with  t{y,fji).  We  proceed  by  calculating  /u  and  /22-  Since 


dt{y,  n)  y- 


dH{y,^)  2/i-3y 


Thus 


-E 


dH{y,fi) 


Since 


So 


log o(A, y)  =  log ^  +  logy 

aioga(A,y)  ^  J_ 
dX  2X' 


dHoga{X,y)  ^  1 
dX^  2A2' 


and 


-E 


d'^  logQ(A,y) 
5A2 


1 

2A2' 


Thus  III  =  ^  and       =         Hence  =  ^,  ^i2(A)  =  A,  /i2i(A)  =  ^,  and 

/l22(At)  =  1. 

Suppose  Fi , . . . ,  F„  are  independently  and  identically  distributed  with  an  IG  den- 
sity. Let  Ujeffreys'ifJ',  X\y)  denote  the  posterior  under  Jeffreys'  prior  with  data  denoted 
by  y.  Then 


^Jeffreys' X\y)  OC  TTjeffreys'ifJ',  ^)  ^  ^'"^  ^xp 
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Since  ir Jeffreys'  oc  A  ^/^/i       we  have 


A  2  e 


_      1  V^t 


We  recognize  nje//reys'(A*,  A|y)  as  a  gamma  density  in  A,  so 


Ujeffreys'iM^  /  ^jeffreys'{fJ',My)d>' 
J  0 


-3/2 


"  [EIL,      -  1)^ 


n+l  : 


as  long  as  Ui  ^  V  i  We  can  verify  the  propriety  of  Ujeffreys'il^,  My)  if  we  can 
show  that  Ujeffreys'il^lv)  IS  propcr.  Letting  t  —  ^,  a.  change  of  variables  implies 


roo  rc 
/  Iljeffreys'{tJ'\y)dfJ.(X 

Jo  Jo 


^-1/2 


21  (n+l)/2 


oft. 


Let  g{t)  =  [Er=i  ^(^2/.  -  1)^ 


-,  -("+') 


For  any  value  of  ^o,  9{t)  is  continuous,  therefore 


bounded  on  [0, to].  Say  g{t)<M  onte  [0,  to].  Then  ,  wv    t  *  c'>  '    '  ^ 


r  t-'/^g(t)dt  <  r  t-'^^Mdt  =  2to'/'M. 
^0 


Now  consider 


lim  — ,  ,„,--    =  lim 

t->oo  ^-("+3/2)  t_>oo 


lim 


[Er=ii(^V-2%  +  i)] 
1 


n+l 
2 


f->oo  r, 


U=^i-^y^'-'-f  +  ^)] 


n+l 
2 


(2.2.29) 
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which  is  equal  to  [Yh=i  Vi]        ^  constant.  Thus  for  sufficiently  large  t  (say  t>  to), 


k 


for  some  constant,  k,  by  the  limit  ratio  theorem.  We  are  now  in  position  to  verify  the 
propriety  of  U.jejfreys'ilJ'ly)  by  observing  that 


which  is  finite.  The  propriety  of  the  posterior  under  Jeffreys'  prior  is  thus  established. 
The  proof  in  the  case  of  the  reference  prior  is  similar. 

Remark  2.2.1  Two  other  noninformative  priors  will  be  mentioned  in  this  section  for 
completeness.  Liseo  (1993)  has  derived  another  reference  prior  for  the  inverse  Gaus- 


space  (/i^A,  A)  with  an  increasing  sequence  of  nested  rectangles.  Liseo's  reference 
prior,  7rre/(;j2A,A)  fulfills  the  definition  of  reference  prior  when  A  is  the  parameter  of  in- 
terest and  fj,  is  considered  a  nuisance  parameter.  Banerjee  and  Bhattacharyya  (1978) 
use  the  noninformative  prior,  tt^b  oc  fi~^X~^,  in  their  work  on  the  inverse  Gaussian 
distribution.  This  prior  is  derived  by  requiring  that  7rBB(/x"~^|A)  be  constant  and 
7rBB(A)  be  invariant  under  1:1  transformations  of  A.  Banerjee  and  Bhattacharyya 
(1978)  conclude  that  the  posterior  under  Jeffreys'  prior  is  improper.  We  were  not 
able  to  make  the  same  conclusion  (see  section  above). 

Remark  2.2.2  The  posteriors  under  ttbb  and  'KTef{ii'^\,\)  are  both  proper.  The  proof  is 
analogous  to  the  proof  that  shows  that  the  posterior  under  Jeffreys'  prior  is  proper 
(see  section  above). 


sian  distribution,  nref(fi2x,\)  oc  /i  ^/^A        by  considering  covering  the  parameter 


2.2.5    Fisher- Von  Mises 
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Consider  the  parameterization  of  the  von  Mises  density  given  below 


/(%,A)  = 


1 


27r/o(A) 


^Acos  {6-fi) 


^e[0,27r],    A>0,  /iG[0,27r]. 

where  9  is  the  angular  observation,  A  is  the  concentration  parameter,  and  /o(A)  is  the 
modified  Bessel  function  of  order  zero.  In  general, 

1 


1  r^"" 

/(A)-—/  cospOe^'^'^de. 
27r  Jo 


Writing  A)  in  dispersion  model  form,  we  associate  2tvIoW         ^i^^  ^) 

cos(^  -  n)  with  t{9,fj,).  We  proceed  by  calculating  In  and  l22-  Since 


dfj, 


=  —  sin  {6  —  /i), 


and 


we  have 


^^  =  -cos(^-/.), 


-E 


dH{e,  /i) 


1  r^^ 


/i(A) 


27r/o(A)yo /o(A)' 

We  will  follow  Mardia's  (1979)  notation  and  define  A{X)  =  To  compute  I22  we 
note  that 


d\oga{\,9) 
^A 


-J^log/o(A) 


1  d 
/o(A)dA 


/o(A) 
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=   --4tt  /    COS 96^"°'^ de 
lo(X)  Jo 


'o(A) 
/o(A)' 


(2.2.30) 


and 


Now 


d^\ogai\,e) 


■^1/.(A)i"'}ma)  +  ^^MA) 


d\  V2-K  Jo 
2n  Jo 


^[/o(A)+/2(A)] 


(2.2.31) 


(2.2.32) 


by  a  trigonometric  identity.  We  now  use  a  relation  given  in  Mardia  (1979,  p.  63) 

/i(A) 


/2(A)  =  /o(A)  - 


(2.2.33) 


to  conclude  that 


dHoga{X,9) 
aA2 


A 


Thus  In  =  AA(A)  and  1^2  =  1-^-  A{Xf.  Hence  hn{fJ>)  =  1,  hu{X)  =  XA{X), 

/121(A)  =  1  -  ^  -  A{X)\  and  h22{fi)  =  1. 

Suppose  9i,...,9n  are  independently  and  identically  distributed  with  a  von  Mises 
density.  Let  Ujeffreys'il^,  A|^)  denote  the  posterior  under  the  Jeffreys'  prior  with  data 
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denoted  by  ^.  To  check  the  posterior  for  propriety,  we  will  write  the  likelihood  in 
a  convenient  form  using  more  of  Mardia's  (1979).  Define  C  —  ELiCOS^i,  S  = 
Er=i  sin^i,  and  R"^  ^  +  S^.  Let  C  =  C/n,  S  =  S/n,  and  R  =  R/n.  We  define  6  to 
be  the  angle  such  that  cos^  =  C  and  sin^  =  S.  Then  the  likelihood  for  data  denoted 
by  9  is  given  by 

By  applying  the  cosine  angle  subtraction  formula  we  have 


L(n,X\e)     =  ^  nA(CcosM+SsinM) 

'  ^  (27r)"/o"(A) 


^  gTi  AR(cos  9  cos  /i+sin  9smn) 


(27r)"/o"(A) 

_  ARcos(e-/j)  .  O  0 

by  applying  the  cosine  angle  subtraction  formula  once  again.  We  may  then  write 

1 


AiJcos(e-/j) 


Since  ■njefjreys'  OC  \JXA{\){\  -       -  A{Xf'),  we  have 

I  A(X\  pARcos{e-/i) 

Y[,^Jfr.yA^i,m  cx  yA^(A)(l  -         -  A(Af)'  ■ 
We  recognize  rije^/rej/i'l/^,  A|^)  ^  ^  von  Mises  density  in  //,  so 

•/  0 
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ocy'AA(A)(l-^-A(ArtM 


(A) 


We  must  check  /o°°  nje//reys'(A|^)c/A  for  convergence.  The  following  lemma  will  help. 


Lemma  2.2.1  R<1. 


Proof  of  Lemma  2.2.1  The  lemma  is  proved  by  noting  that 


T  2 


cos  9i 


.1=1 


+ 


X!  sin  9i 


.i=l 


=  ^  cos^  0i-\-2   ^    cos  9i  cos    +  ^  sin^    +  2  ^in  6i  sin 

l<t<j<n  »=1  l<i<j<n 


i=l 


Applying  (once  again)  the  cosine  angle  subtraction  formula  gives 

l<i<j<n 

Since  cos  ^  <  1 ,  we  see  that 

<n  +  2    Yi    1  =  ^  +  2-^^ — -  =  n^. 


l<i<j<n 


This  then  implies  that  R  <  1. 


Now  we  return  to  the  task  of  checking  /q°°  nje//reys'(A|^)<^A  for  convergence.  The 
integrand  vanishes  at  zero  since  /o(0)  =  1  and  A{0)  =  0.  For  large  A,  the  expansion 
from  Mardia  (1979) 

assures  that  A{X)  is  bounded.  From  Watson  and  Whittaker  (1958),  /o(A)  behaves  as 
for  large  A.  Thus  For  a  for  A  >  Aq,  Aq  large 


^Jeffreys'  {\\e)  OC  ^ 


V27rXR  e"^ 
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which  we  recognize  as  a  gamma  density  in  A.  By  our  lemma  we  have  that  ^  <  1,  thus 
nje//re!/s'(A|^)  will  convcrge  for  any  value  of  i?  ^  1.  If     =  1,  then  cos  (^j  -  9j)  =  ' 
1    V    i  <  j  which  implies  all  observations  are  identical.  Thus  in  the  case  of  distinct 
observations,  we  have  verified  that  the  posterior  distribution  is  proper  under  Jeffreys' 
prior.  The  proof  in  the  case  of  the  reference  prior  is  similar. 

Remark  2.2.3  For  completeness  the  noninformative  prior  given  by  Bagchi  and  Kadane 
(1992)  is  mentioned.  They  assumed  a  flat  prior  ("purely  for  convenience  and  illus- 
trative purposes").  Their  prior,  ttbk  oc  A,  results  in  a  proper  posterior.  The  proof  is 
analogous  to  the  case  of  Jeffreys'  prior  as  given  above. 

2.2.6    Student's  t 

Consider  the  parameterization  of  the  Student's  t  distribution  given  below 


/(2/|a*,A) 


r(A) 


v^r(A  - 1/2) 


exp-Alog[l  +  (y  -  nf] 


(2.2.36) 


where  -oo  <  <  oo,  -oo  <  y  <  oo,  and  A  >  1.  Writing  /(y|//,  A)  in  dispersion 
model  form,  we  associate  ^^^^  ^^^^  ^)        ~        +  (?/  ~  A*)^]  with  t{y,  jx). 

We  proceed  by  calculating  /n  and  l22-  We  have  that 


dH{y,  li)    ^    2(1  +  {y-  /x)^)  -  A{y  -  ^if 
diJ?  (1  +  (2/  -  ^^YY 


=  2 


=  2 


1 


(1  +  (y  -  z^)') 
1 


(1  +  (2/  -  /i)') 


-4 


-4 


l  +  (y-/x)^)-l 
(1  +  (y  -  //)2)2 

1 


(l  +  (j/-/i)2)J 


+  4 


(1  +  (2/ - /i)2)2 
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Therefore 


-E 


dHjy,  fi) 


g    r(A)    r  r(A  + 1-1/2)  _  r(A  +  2-i/2) 

r(A  +  i/2)\    r(A  +  i)  r(A  +  2) 

2A-  1 
A  +  1  ■ 


The  information  matrix  is  given  by 


/(/i,  A)  =  Diag 


'A(2A-1)  logr(A) 


A  +  1    'dA2r(A- 1/2), 


(2.2.37) 


We  then  have  that  hM  =  1,  /112(A)  =  ^^f^.  /121(A)  =  ^fSI^'  and  /i22(/x)  =  1. 


dA2  r(A-l/2)' 


Thus 


'^jeffreys' 


A(2A-  1)  (P  logr(A) 
A  +  1    dA2  r(A  -  1/2) 


1/2 


and 


'^ref(ii,X)  ^  ^reverse— re f{^i,X)  *^ 


logr(A) 
dA2r(A-  1/2) 


1/2 


(2.2.38) 


(2.2.39) 


Suppose  ?/  =  Yi, . . . ,  y„  are  independently  and  identically  distributed  with  a  Student's 
t  pdf  as  given  in  (2.2.36).  The  likelihood  is  then  given  by      ...  . ,  ■  .  , 


r(A) 


^0Fr(A-l/2);  n(i  +  (y._^)2)A- 


1 


(2.2.40) 


We  will  show  that  the  posterior  under  Jeffreys'  prior  and  the  reference  prior  is  proper 
when  at  least  two  of  the  Y-s  are  distinct.  If  all  the  observations  are  equal  (which 
includes  the  case  when  n  =  1),  the  posterior  under  Jeffreys'  prior  and  the  reference 
prior  is  not  proper. To  establish  the  latter  assume  that  yi  —  y2  =  . . .  =  y^,  and 
consider 
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where 


P(A)  = 


r"(A) 
r"(A  - 1/2) 


A(2A-  1) 
A  +  1 


rf^  logr(A) 
dA2r(A-  1/2)' 


1/2 


(2.2.42) 


For  the  posterior  under  Jeffreys'  prior,  a  =  1/2,  and  for  the  posterior  under  the 


reference  prior,  a  —  0.  Now  let  t 


and  transforming  variables  gives 


n{x\y  oc  9{x)  Ce^-'i\\-t)-'i^dt 

Jo 

=  9{X)B{nX,l/2) 
r(nA  -  1/2) 


oc  5(A)- 


T{nX) 


Thus 


m\y)  ^ 


r"(A) 


r"(A- 1/2) 


A(2A-  1) 
A  +  1 


logr(A) 
(iA2r(A-  1/2) 


1/2 


r(nA  -  1/2) 
r(nA) 


Using  Stirling's  formula  we  have  that  for  large  A 


r(A) 


r(A  - 1/2) 


AV2. 


Using  Stirling's  formula  we  can  establish  that  for  large  A 


(^2    logr(A)  r(nA-l/2) 


dA2  r(A  -  1/2)  r(nA) 


1/2 


\/2X'  , 


(2.2.43) 


(2.2.44) 


(2.2.45) 


Thus  for  sufficiently  large  X,g{X)  a  A"+"/2-i^  j^gn^e  n(A|y)  behaves  as  A°+"/2-i  ^^^3 
infer  that  the  posterior  under  Jeffreys'  or  the  reference  prior  is  improper. 

Now  suppose  there  are  at  least  two  distinct  observations.  For  simplicity  suppose 
1/1=0  and  y2  =  a  >  0.  Then 


/oo     "  I 
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=  p(A)/_ 
<  ^(A)/_ 


1 


1 


n 


1 


00  (1  +  (u  -  y^yy  (1  +  (u  -  y2y)'  fL\  (1  +  («  -  y^y)' 

oo  1  1 


du 


00  (1  +  (1  +     -  a)2)> 


/•oo  1 


=  2y(A) 


<  2p(A) 


/2  (1  +  («)2)>  (1  +  (u  -  a)2)^ 

3a/2  1  1 


-du  + 


yoo  1 
^30/2  (1+  (l 


12    (1  +  {ufY  (1  +  («  -  y3a/2  (1  +  [ufY  (!  +  («-  a)^) 


,        /-oo  1 

+  /  ^TTTTrdM 

( 1  +  U 


2\2A 


since    , }  ..o^  <  Trrr' — ^2^  <  ^,  for  some  5  <  1  when  u  >  a/2.  Further, 


00  1 


12  (1  +  U2)2A 


du 


=  [y-''\\-yf-"'dy 


for  some  0  <  c  <  1.  Hence 


00  1 


a/2  (1  +  U2)2A 


du 


Thus 


n(A|2/)   <   25(A)  [a5  + 2(1 -cV2)(i_c)^-3/2] 
=  %(A)6^-^/^ 


in 


(2.2.46) 


<  {l-cf-^l''  j\-"''dz.  (2.2.47) 


for  some  constants  k  and  e,  where  0  <  e  <  1.  We  have  established  that  for  large  A, 
g{X)  behaves  as  A°+"/2-i^  t;hus  by  writing  e'^^^/2  g(A-3/2)iog£^  recognize  n(A|y) 
as  a  gamma  density  and  hence  n(/x,  \\y)  is  proper.  i  { 


41 


2.2.7  Power 


Consider  the  parameterization  of  the  power  density  given  below 


/(2/|m,A) 


27r(7) 


S  —  1/7,    —00  <    <  00,    —00  <  y  <  00,    A  >  0. 

Writing  /(?/|/x,A)  in  dispersion  model  form,  we  associate  2^^^^  with  a(A,  y)  and 
\y  -  lif  with  t{y,tj).  We  proceed  by  calculating  In  and  l22-  We  have  that 


E 


dH{y,  fi) 


27r(7) 


7    {-yY-'e'y'dy+  Tiv) 

.J-00  Jo 


'-^e-^y'dy 


=  2 


27r(7)  Jo 


e-^y  dy. 


Let  z  —  y^ ,  and  by  transforming  integral  we  have  that 


E 


dH{y,f,) 


=  5 


Jo 


7r(7) 
(j-i)A^r(i-7) 


A^-^7r(7) 


1-7,2.-1^(1-7) 

y  r(7) 


(2.2.48) 


Since 


and 


aiogQ(A,y)  ^  7 
aA  A 


d'^  loga(A,y)  ^  _  7_ 
dX^  A2' 


We  have  that 


V 


-E 


d'^  \oga{X,y) 


2_ 

A2- 


Thus  In  =  ^A^-rl^ii^^  and  h2  =  ^.  Hence  hn{fi)  =  1,  h.^iX)  =  /i2i(A)  = 
A"2,  and  h22{^l)  =  1- 

Suppose  Yi,...,Yn  are  independently  and  identically  distributed  with  a  power 
density.  Let  Hjez/reys' (a*,  X\y)  denote  the  posterior  under  the  Jeffreys'  prior  with  data 
denoted  by  y.  Then 

Since  T^jeffreys'  <^  A'''"^,  wc  have 

We  recognize  Uje/freys'ilJ',  Mv)  ^  ^  gamma  density  in  A,  so  our  task  becomes  checking 
Ujeffreys'{l^\y)  fov  propriety. 


^jeffreys'iM  =  /      Ujeffreys'ifJ;  X\y)dX 
J  0 


oc 


r;^(„+ih-ig-AE:=,lyi-H*dA 
Jo 


oc 


u=i 


-(n+l)7 


To  proceed  we  will  apply  the  following  inequality  due  to  Liapounov. 


Inequality  2.2.1  Define  a  random  variable  Z  such  that 


P[Z  =  Yi\  =  -       i  =  l,...,n. 
n 


Then  for  p  >  1, 
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By  applying  the  inequality  we  note  that 


-(n+l)/p 


< 


1  " 


-(n+1) 


(2.2.49) 


We  will  now  be  able  to  show  that  HjeZ/reya'CA*!?/)  is  a  proper  density.  We  have  that 


^Jeffreys' {^J'\y)  0^ 


Lt=l 


-(n+l)7 


< 


.i=l 


-(n+1) 


The  last  inequality  follows  from  (2.2.49)  with  p  =  l/j.  We  have  reduced  our  task 
to  checking  [EiLi  —  m|]~'"^^^c^A*  for  convergence.  We  will  proceed  by  splitting 
the  integral  in  a  convenient  way.  Note  that 


integral  I 


f 

J—c 


U=l 


-(n+1) 


J — ( 


Li=l 


-(n+1) 


/[|,x-y|>c]C'/^ 


integral  II 


+ 


j-y+c 

Jy-c 


U=l 


-(n+1) 


where  c  is  a  positive  constant  and  I[\^-y\>c]  denotes  the  indicator  function.  Observe 
that  :  .  .        V  ■ 


J  —t 


Lt=l 


1-(n+l) 


I[\^-yl>c]dfi 


/oo 
[n\n  -  j/|]"^"+^^/[|^_5|>c]d/i, 
-oo 


since 


i=l 


»=1 


=  n|/i-y| 
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Let  t  —  ^  —  y,ihen 


r  [n\^i  -  y|]-("+^)/[|^_,|>e]rf/i  oc  2  r  r("+^)dt 

J—oo  Jc 


=  2—, 
n 


a  constant.  Thus  integral  I  is  finite.  Recall  the  following  inequality 


J    n  1  " 

-  ~  A*l  ^  ~  ~  ymedian\- 


"r=i 


We  then  have  that 


Jy-c 


u=i 


-(n+l) 


ry+c 
<  / 

Jy-c 


ElVi-ymedianl 
.1=1 


-(n+l) 


n  -{"+1) 
—  2c^^  I?/,-      2/median|  > 
i=l 

a  constant.  We  have  established  that  integral  II  is  also  finite  which  then  implies  that 
Ujeffreys'{f^\y)  IS  a  proper  density.  The  proof  that  the  posterior  is  proper  under  the 
reference  prior  follows  in  a  similar  fashion. 


2.2.8  Summary 


We  can  summarize  our  calculations  by  recording  the  values  of  a(-),  f(-),and  the 
/ijj(  )'s  of  considered  densities  in  the  table  provided.  These  component  functions  will 
give  us  information  about  optimality  in  the  sense  of  probability  matching  in  Chapter 
3. 


4S 


Table  2.1.  Summary  of  Component  Functions 


Density 

a{X)  b{y) 

t{y,fj') 

Normal 
EDM 

1 

2 

1 

A 

A-2 

Lognormal 

-1 

y 

\{\ogy-  lif 

1 

A 

/A 

Gamma 
EDM 

m 

_1 

y  ' 

-(J-MJ)) 

_2 

A* 

A 

^iogr(A)-^ 

Inverse 
Gaussian 
EDM 

nr 

V  27r 

2^2  y 

A 

A-2 

von 
Mises 
location 

1 

1 

cos  {y  -  n) 

1 

XA{X) 

1     ^  ^(A)2 

27r/o(A) 

Student's 
t 

location 

r(A) 

1 

-  log(l  +  (y  - 

1 

A(2A-1) 

(i2  log  ^(^) 

"  '"8  r(A-i/2) 

0Fr(A-i/2) 

A+1 

dA2 

Power 
location 

AT 

27r(A) 

1 

-\y-^^\' 

1 

A2T 

A-2 

CHAPTER  3 
PROBABILITY  MATCHING  PRIORS 


The  notion  of  probability  matching  (PM)  is  a  relatively  new  criterion  for  selection 
of  noninformative  priors,  originally  due  to  Welch  and  Peers  (1963),  and  Peers  (1965), 
and  more  recently  popularized  by  Stein  (1985).  The  advantage  of  using  a  PM  prior 
is  the  reconciliation  afforded  between  Bayesian  and  frequentist  inference.  When  not 
much  previous  information  is  available  about  a  parameter,  it  may  be  desirable  to 
obtain  a  Bayesian  credible  set  which  closely  approximates  a  frequentist  confidence 
interval.  Bayesian  inference  incorporating  PM  priors  is  a  potentially  powerful  method 
for  approximating  frequentist  confidence  intervals  in  cases  where  insufficient  theory 
exists  to  provide  them. 

Let  yi,...,yn  be  iid  with  common  pdf  f{y\6,uj),  where  6  is  the  parameter  of 
interest,  and  u  is  the  nuisance  parameter.  A  prior  tv{6,uj)  is  said  to  satisfy  the  first 
order  probability  criterion  (FOPMC)  if  .    "  :  -         i  • 

P{e>e,_^{7r,y,,...,yn)\e)=a  +  Op{n-'2),  (3.0.1) 

where  di^ai'^,  J/i,  •  •  • ,  2/n)  is  the  upper  ath  percentile  of  the  posterior  distribution  of  9 
under  the  prior  tt.  Roughly  speaking,  (3.0.1)  requires  the  tail  probabilities  of  posterior 
distributions  to  match  asymptotically  (in  probability)  the  corresponding  frequentist 
coverage  probabilities. 
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In  the  this  chapter  we  will  apply  theorems  found  in  Datta  and  Ghosh  (1995a), 
and  Mukerjee  and  Dey  (1993),  to  dispersion  models.  We  will  characterize  the  condi- 
tions when  Jeffreys'  and  the  reference  prior  satisfy  the  probability  matching  criteria. 
Moreover,  we  will  give  a  recipe  for  a  second  order  optimality  prior,  Wilson's  prior, 
which  is  distinct  from  the  reference  prior  in  the  case  of  the  Fisher-von  Mises,  the 
gamma,  and  the  Students'  t  densities.  We  conclude  by  reporting  simulation  results 
and  analyzing  two  real  data  sets. 

3.1    First  Order  Optimalitv 

Theorem  3.1.1  For  densities  in  the  dispersion  model  family  whose  information  matrix 
elements  can  be  written  as 

In  oc  hn{^i)hi2{X) 

I22  oc  /l2l(A)/l22(Ai), 

where 

hij  >  0,    y   i,  j, 

then 

(i)  T^re f{ii,\)  satisfies  first  order  probability  matching  criterion  for  both  ^  and  A. 

(ii)  IT  Jeffreys'  satisfics  first  ordcr  probability  matching  criterion  for  n  iff  /i22(m)  is 
a  constant.  [This  condition  is  satisfied  for  densities  which  are  location  subtypes.] 

(iii)  'K Jeffreys'  satisfics  the  first  order  probability  matching  criterion  for  A  iff /112(A) 
is  a  constant.  [This  condition  is  never  satisfied  for  densities  which  are  EDM  subtypes.] 


Proof  of  Theorem  3.1.1  To  prove  the  above  we  will  use  a  theorem  given  in  Peers 
(1965).  The  theorem  reveals  that  7r(  )  satisfies  first  order  optimality  for  ^  iff 


9/i 


/r//'(/i,A)7r(-)  =0, 


(3.1.2) 
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and  7r(-)  satisfies  first  order  optimality  for  A  if  and  only  if 


dX 


(3.1.3) 


Datta  (1996)  has  shown  that  7r(-)  satisfies  first  order  optimaUty  for  both  fji  and  A 
if  and  only  if  (3.1.2)  and  (3.1.3)  hold.   Recall  that  TTrefiti,x)  oc  h\{^{n)hl{'^{\)  and 


■^Jeffreys'  OC  y^/iji (/i) /112(A) /i2i (A) /i22(At) •  The  proof  of  (i)  foUows  since 


I;  K^(M)/^r2^/^(A)Mf  (/.)/^^f  (A)] 


d_ 
=  0, 


'hn'^'{X)hl{\\)] 


and 


d 


dX 


dX 

d_ 

dX 
0. 


The  proof  of  (ii)  obtains  since 


d_ 


In^^{H,X)7Tjeffreys']  =  ~  [/il/^^(/i)/ll2^^^(A)/llf  (/i)/ll^^(A)/l2f  (A)/l22^(/i)] 


dfx 


^hy,\x)^[hy,\^4\ 


which  equals  zero  if  and  only  if  /i22(a*)  is  a  constant.  For  location  subtypes  and 
J0rgensen  proper  dispersion  models,  /i22(/^)  ~  1-  Thus  Jeffreys'  prior  satisfies  first 
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order  optimality  for  location  and  J0rgensen  proper  dispersion  models.  The  proof  of 
(iii)  follows  in  a  similar  vein  by  noting 


dx 


=  C(/^)|rK(A)l 


dX 

which  equals  zero  if  and  only  if  /112(A)  is  a  constant.  For  EDM  subtypes,  /112(A)  =  A. 
Thus  Jeffreys'  prior  never  satisfies  first  order  optimality  for  EDMs. 

3.2    Second  Order  Optimality 


The  next  theorem  provides  the  unique  second  order  prior  in  the  case  where  //  is 
regarded  as  the  parameter  of  interest  and  A  as  a  nuisance  parameter,  if  it  exists.  The 
theorem  also  characterizes  the  conditions  for  second  order  optimality  in  the  reverse 
case,  i.e.  A  is  regarded  as  the  parameter  of  interest  and  /i  is  the  nuisance  parameter. 

Theorem  3.2.1  For  densities  in  the  dispersion  model  family  whose  information  matrix 
elements  can  be  written  as  "         •  " 

In  cc  /ill (m) ^12 (A),  ■  . 

I22  OC  /l2l(A)/l22(A*), 

=  -  '"' 

)  i 

where 

hij>0,    V  ij, 

then 

(i)  The  prior,  7r^i;^o„,  given  by 


TT.,,^  oc  A/i2i(A)/il(^(M)e5/^^^(^''^)''"W'^r3''^W«'A 


where 


c(A,  n) 


V* 


(3.2.4) 


(3.2.5) 
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is  second  order  optimal  in  the  case  where  /i  is  regarded  as  the  parameter  of  interest 
and  A  as  a  nuisance  parameter  iff  c(A,//)  satisfies  ^c(A,/x)  —  0. 

(ii)  A  prior,  7r(A, /i),  is  second  order  optimal  in  the  case  where  A  is  regarded  as 
the  parameter  of  interest  and  /i  as  a  nuisance  parameter  iff  the  following  conditions 
are  satisfied 

7r(A,/i)  =  d(/u)/i^f(A)  (3.2.6) 


d_ 
dX 


\oga{X,y)  +  t{y,fi) 


(3.2.7) 


where  d{fi)  and  A;(/x)  are  arbitrary  functions  of  [Note  that  Jeffreys'  prior  will  not 
be  second  order  optimal  for  A  if  /112(A)  7^  !•]  • 

Proof  of  Theorem  3.2.1  We  will  prove  (i)  by  applying  a  result  from  Mukerjee  and  Dey 
(1993).  They  prove  that  a  prior  7r(A,  fi),  is  second  order  optimal  in  the  case  where  fi  is 
regarded  as  the  parameter  of  interest  and  A  as  a  nuisance  parameter  iff  the  following 
hold  for  some  function  d{X) 


7r{X,fi)  =  d{X)h\{'{i,)h\i\X) 


(3.2.8) 


and 


{d{X)-'}  ^\d{X)hn'^\fi)h^,'^'{X)h^l{X)h^,Hf^)E 


f_d_ 

a/i2  dx 


[loga(A,y)  +  Xt{y,iJ.)] 


6  dfi 


h',^l\n)h-,!'\X)E 


{log  a{X,  y)  +  Xt{y,  fi)) 


=  0. 


(3.2.9) 


We  can  simplify  (3.2.9)  by  noting  that 

d 


E 


dfi'^  dX 


[loga(A,y)  +  Xt{y,n)] 


=  jhn{n)hi2{X) 
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and 


E 


d_ 


(logo(A,y)  +  Ai(y,/x)) 


=  yE 


If  we  then  define  the  functions  /  and  h  by 


f  =  h\{'{fi)h\i'{X)h^,\\)h^,\fi)X-' 


h-,l'\ix)E 


we  can  rewrite  (3.2.9)  as 


—  [d{X)f]  +  d{X)h  =  0. 


We  can  now  expUcitly  solve  the  above  differential  equation  for  d{X).  The  solution  is 
given  by 


d{X)  oc  exp 


{-!>] 


where  /(/i)  is  an  arbitrary  function  of  /i.  If  we  substitute  the  values  of  /  and  h  into 
(3.2.1),  and  simplify,  we  have  that  -  v  «. 


d{X)  oc  mXK{l'\ti)h22{^i)h-,^'\X)h2,{X) 


1/2/ 


X  exp 


X'h-[^l\X)h-,^l\X)l  [h-,^'\n)E  [(|it(2/,/i))']} 


6/ilf(/^)/ilr(A)/x2i'(A)/i22(/^) 


dX 


Since  d{X)  must  be  a  function  only  of  A,  we  require  that 


We  also  require  that 


.      h22{^l)  d 


be  constant  with  respect  to  /i,  i.e.  ^c(A,/i)  =  0.  We  then  obtain  a  second  order 

optimal  prior,  TTu^ijson,  by  substituting  the  solution  of  rf(A)  into  (3.2.8). 

We  will  prove  (ii)  by  again  applying  the  result  from  Mukerjee  and  Dey  (1993). 
They  prove  that  a  prior  7r(A,  /x),  is  second  order  optimal  in  the  case  where  A  is  regarded 
as  the  parameter  of  interest  and  /i  as  a  nuisance  parameter  iff  the  following  hold  for 
some  function  (i(/i) 

7r(A,/x)  =  d(//)/i^{'(A)/i^/'(/x)  (3.2.10) 


and 


c/(/i)-'|^|rf(/i)/irin/^)/^r2nA)/i27^'(A)/i22^'(/x)^ 


[loga(A,j/)  +  \t{y 


\_d_ 


^  (loga(A,?/)  +  At(?/,/i)) 


=(3C2.11) 


Since 


E 


{\oga{X,y)  +  Xt{y,n)) 


=  E[0]  =  0, 


we  can  rewrite  (3.2.11)  as 


d_ 
dX 


'■21 


{X)E 


d  ^ 
—  log  a{X,y)  +  t{y,fi) 


=  0, 


which  holds  iff 


hliHX)  -  k{fi)E 


'd_ 

dx 


log  a{X,  y)  +  t{y,n) 


where  k{fi)  is  some  arbitrary  function  of  //. 


Remark  3.2.1  Since  (3.2.7)  does  not  involve  the  prior  7r(-)  we  observe  that  the  SOPMC 
for  A  fails  to  discriminate  amongst  first  order  optimal  priors.  In  fact,  (3.2.6)  and 
(3.2.7)  imply  that  either  every  first  order  optimal  prior  for  A  is  also  second  order 
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or  that  no  second  order  prior  for  A  will  exist.  This  is  because  the  condition  (3.2.7) 
depends  on  characteristics  of  the  dispersion  model  and  is  unrelated  to  the  prior.  In 
the  case  of  the  inverse  Gaussian  distribution, (3.2.7)  does  hold.  Hence,  every  first 
order  optimal  prior  is  also  second  order  optimal.  Sun  and  Ye  (1996)  have  claimed  in 
this  case  that  nre/(^,A)  is  the  unique  second  order  optimal  prior.  This  contradicts  our 
conclusion. 

Remark  3.2.2  Note  that  Tr^iison  is  not  first  order  optimal  for  A  unless  /121  (A)  is  constant 
since  (3.1.3)  is  not  satisfied  otherwise. 

The  value  of  Theorem  3.2.1  is  the  construction  of  a  prior  which  should  render  the 
Bayesian's  credible  set  within  Op{n~^)  of  the  Frequentist's  confidence  interval.  The 
second  order  optimal  prior  in  the  case  where  n  is  the  parameter  of  interest  has  been 
denoted  by  nyjUson-  Is  Tr^uson  distinct  from  irref{fi,x)  or  'njeffreys'^-  We  give  an  answer 
in  the  following  corollary  which  can  be  applied  subject  to  a  condition  often  satisfied 
by  common  dispersion  models. 

Corollarv  3.2.1  Let  f{y\n,  A)  be  any  dispersion  model  whose  information  matrix  ele- 
ments can  be  written  as  "  ■      '    K  ''  ' 


where  the  h'^jS  >  0.  Suppose  we  consider  /i  to  be  the  parameter  of  interest  and  regard 
A  as  a  nuisance  parameter.  If  c(A, //)  —  0,  where  c(A,/x)  is  defined  in  (3.2.5),  then 


In   oc  /iii(At)/ii2(A) 


I22     OC  /l2l(A)/l22(/i) 


V 


•>  4 


iff       h2l^'{X)h\i\\)hli'{fi)  =  A, 


and 


'^Wilson  —  ^re/(/i,A) 


iff      h^l^'iX)  =  A. 
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Proof  of  Corollary  3.2.1  The  proof  follows  from  (2.1.7),  (2.1.8),  and  Theorem  3.2.1. 

We  can  apply  Corollary  3.2.1  to  the  inverse  Gaussian,  lognormal,  von  Mises,  and 
power  densities  considered  in  the  last  section  since  it  is  not  difficult  to  verify  that 
c(A,/i)  =  0  for  each.  For  these  densities  h2i^^{X)h\i^{X)hli^{n)  ^  A,  thus  Jeffreys' 
prior  is  not  second  order  optimal  for  any  of  them.  In  the  case  of  the  inverse  Gaussian, 
lognormal,  and  power  densities,  h2i^^{X)  =  A,  so  Tr^iison  =  '^ref{n,x)-  Since  h2i^^{X)  / 
A,  n^iison  1^  7rre/(^,A)  fof  the  Fisher-von  Mises,  gamma,  and  Student's  t  densities.  We 
specify  ttmsoti  for  the  Fisher-von  Mises  density  in  the  following  corollary. 

Corollary  3.2.2  For  the  von  Mises  density. 


and  Uwiisonifi',  M^)  is  a  proper  posterior. 

Proof  of  Corollary  3.2.2  We  will  prove  Corollary  3.2.2  by  an  application  of  Corollary 
3.2.1.  Recall  that  for  the  von  Mises  density  we  have  /tii(/i)  =  1,  /112(A)  =  Ai4(A), 
,  /i2i(A)  =  1  -  ^  -  A{Xf,  /i22(/i)  =  1,  and  t{y,  n)  =  cos  (y  -  /x).  Since 

we  have  that 


(3.2.12) 


0. 


Thus  c{X,n)  =  0.  From  Corollary  3.2.1  we  then  have  that 


"^Wilson     OC     A/l2l(A)/iJf  (/i) 
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The  proof  to  show  that  U^t^usonilJ;  M^)  is  a  proper  posterior  mirrors  that  of  the  pos- 
terior under  Jeffreys'  prior.  Details  can  be  found  in  Section  (2.2.5). 


Corollary  3.2.3  For  the  Student's  t  density, 


'^Wilson  ^  ^ 


and  Uwiisonif^,  M^)  is  a  proper  posterior. 


(P  r(A) 


dA2  r(A  -  I) 


(3.2.13) 


Proof  of  Corollary  3.2.3  Recall  that  for  the  Student's  t  density:  /iii(/i)  =  1,  /112(A) 
h,{X)  =  Aff^,  and  hM  =  1-  Since 

d  2(y-fi) 

^^(^'^)  =  TT(^' 

we  have  that 


E 


f 


r(A) 


{y  -  m)' 


0Fr(A  -l)[l  +  {y-  ^)2]A+3 


dy 


0, 


since  the  integrand  is  an  odd  function.  Thus  c(A,/x)  =  0.  From  Corollary  3.2.1  we 
then  have  that 

rf2  r(A) 


^Wilson  ^  A 


rfA2  r(A  -  i) 


The  proof  to  show  that  U^iUonif^,  A|^)  is  a  proper  posterior  mirrors  that  of  the  pos- 
terior under  Jeffreys'  prior.  Details  can  be  found  in  section  (2.2.6). 
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Corollary  3.2.4  For  the  gamma  density, 

^2  1 

(3.2.14) 


A 

'^wilson 


and  UyjiUonifJ',  M^)     ^  proper  posterior. 

Proof  of  Corollary  3.2.4  The  proof  to  establish  (3.2.4)  will  be  postponed  to  the  next 
section  where  it  will  follow  transparently  from  another  result.  To  show  that  U^usmilJ',  M^) 
is  a  proper  posterior  is  a  matter  of  inserting  the  value  of  1  for  both  a  and  ^  in  the 
proof  contained  in  section  (2.2.3).  Conditions  (2.2.22)  and  (2.2.28)  imply  that  the 
posterior  under  Wilson's  prior  will  converge  if  n  >  1. 

3.3    Existence  of  Second  Order  Matching  priors  for  EDM  Subtypes 

Will  second  order  probability  matching  priors  exist  for  all  members  of  the  dis- 
persion model  family?  We  can  partially  answer  this  question  in  the  case  of  EDM 
subtypes  which  are  also  J0rgensen  proper  (i.e.  /i22  =  !)•  The  following  Corollary  to 
Theorem  (3.2.1)  will  help  us. 

Corollary  3.3.1  For  dispersion  models  which  are  EDM  subtypes  such  that 


-E 


^  \oga{X,y) 


OC  /12i(A)/122(m) 


where  h2j  >  0  for  j  =  1, 2,  then  the  prior, 
where 

c(A,/.)  =  x-'h22{^i)v-'/'{^i)^  [K"{9)-'^\"'{e)] 

is  second  order  optimal  in  the  case  where  fi  is  the  parameter  of  interest  and  A  is 
regarded  as  a  nuisance  parameter  iff  ^c(A, /i)  =  0. 


Table  3.1.  Classification  of  some  Noninformative  priors  for  some  J0rgenson  Dispersion 
Models,  ("■/'  indicates  that  the  prior  belongs  to  class  heading  while  "x  "  indicates 
that  the  prior  does  not  belong)  


Density  and  prior 

Jeffreys' 

Re 

•erer 
A 

ice 
A 

Prob 
1st 
for  \i 

)ability 
2nd 
for 

Match 
1st 
forA 

ing 
2nd 
forA 

Normal 

Jeffreys'  oc  A'^/^ 

X 

X 

x/ 

X 

X 

X 

r)_r/,.    \\        \— 1 

lieiy^,  A)  oa  A 

X 

/ 
V 

/ 
V 

X 

/ 

V 

/ 

V 

1 
V 

/ 
V 

ijUgliOrXIlcli 

Jeffreys'  oc  X'^^^ 

X 

X 

\/ 

X 

X 

X 

Ref(//,  A)  oc  A-i 

X 

v/ 

X 

n/ 

Gamma 

V 

X 

X 

V 

V 

X 

X 

X 

Ref(/i,A)  oc  V]^l|^logr(A)  -  { 

X 

/ 

V 

/ 

V 

X 

/ 

V 

X 

/ 

V 

X 

WilsonaA|llogr(A)-i 

X 

? 

? 

X 

v/ 

X 

X 

Inverse  Gaussian 

7  ix         )           — 3/2  \  —1/2 

Jeffreys  oc  /i  ^''^X  ^'^ 

/ 

V 

X 

X 

/ 

V 

/ 

V 

X 

X 

X 

RefC//  W  nc  //"^/^A""^ 

A. 

V 

^/ 

V 

.A. 

V 

X 

V 

V 

Ref(^2;^,  A)  oc 

X 

X 

V 

X 

V 

X 

X 

X 

B  and  B  a  n'^X'^ 

X 

? 

? 

X 

X 

X 

V 

V 

Fisher- Von  Mises 

Jeffreys' ocyAA(A)|l-^ -^2(^)1 

\/ 

X 

X 

sj 

X 

X 

X 

Ref(^,A)ocy|l-^ -^2(^)1 

X 

x/ 

X 

X 

x/ 

X 

vviison  oc       —  aTa)  —  1/1 

X 

? 

? 

X 

/ 
V 

/ 

V 

X 

X 

Bagchi  and  Kadane  a  A 

X 

? 

? 

X 

V 

X 

X 

X 

Student's  t 

T„ff  >_     /A(2A-1)|  d2     r(A)  1 

Jeffreys  oc  ^         |  \ 

1 

V 

X 

X 

/ 

V 

/ 

V 

X 

X 

X 

Ref(/x,A)cxy|j'i^| 

X 

^/ 

X 

v/ 

X 

\/ 

X 

Wilson  cxAl^j^ 

X 

? 

? 

X 

n/ 

X 

X 

Power 

Jeffreys'  oc  A"^"^ 

X 

X 

x/ 

X 

X 

X 

Ref(/i,  A)  oc  A-i 

X 

X 

x/ 

Proof  of  Corollary  3.3.1  For  EDM  subtypes,  /n(A*,  A)  is  proportional  to  the  prod- 
uct of  =  :j7T-y  and  /112(A)  =  A.  Thus  assuming  122(^1,  X)  is  proportional  to 


^2i(A)/i22(a*)»  we  can  apply  3.2.1  to  derive  Wilson's  prior.  We  have 


Since 


"Wilson  '-*-T/-/\'' 


=  ey-K{9), 


d_ 


d.  ^06 


=  {y-K'{e))V-\n) 

y- 


Thus  we  can  write 


c(A,//)  =  /l22()u)y^/2(^) 


d_ 


{y  -  i^y 


A  further  simplification  is  possible  since 


E  [{y  -  txf]  =  E  [y']  -  S/iE  [y']  +  2f, 


From  J0rgensen  (1992)  we  have 


M(s;  9,  A)  =  exp  [A        +  s/A)  -  k{9)}]. 


Thus 


A2  A' 


and 


E 


59" 


Substituting  our  calculations  into  (3.3.17)  we  have  that 


E  [{y  -  iif 


K"'{e) 

A2 


Combining  (3.2.5)  and  (3.3.16)  gives 


dpi 


Since  V{n)  =  K"{e)  and 


^  ^."(0)-/\'"(^)]  1^ 


i  h"(^)-^^v"'(.)]  -i^, 


we  acquire  the  form  of  c(A,  //)  specified  in  corollary. 

We  are  now  able  to  answer  the  existence  question  for  EDM  subtypes  which  are  also 
J0rgensen  proper  since  3.3.16  is  an  equation  depending  on  the  variance  function  of 
the  EDM.  It  is  the  variance  function  which  completely  characterizes  the  EDM  (see 
j0rgensen  (1992)).  In  the  case  where  pt  is  the  parameter  of  interest  and  A  is  regarded  as 
a  nuisance  parameter,  Corollary  (3.3.1)  gives  that  a  second  order  probability  matching 
prior  exists  iff  '  ' 


d_ 


A-V-^/^(m)|[/."(^)-^/V"'(^)] 


d9 


0. 


(3.3.18) 


Thus  we  require  that 


v-'^'M^J^"{er'/\"'{9)]=k. 
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(3.3.19) 


For  some  constant  k.  Recalling  that  V{fj,)  =  k"{9)  and  V(/Lt)  =  ||,  we  may  rewrite 
(3.3.19)  as  a  differential  equation  involving  the  variance  function. 


(3.3.20) 


or 


(3.3.21) 


If  A;  =  0  then  by  solving  (3.3.21)  for  V{n)  we  have  that 


V(/i)  =  Cif?  +  C2 


(3.3.22) 


where  Ci  and  C2  are  any  constants.  Thus  a  second  order  PM  prior  will  exist  for  any 
EDM  with  a  variance  function  of  this  form.  Since  y(/i)  =  /i^  for  the  gamma  EDM 
and  V{n)  —  1  for  the  normal  EDM,  second  order  PM  priors  will  exist  for  these  cases. 

Remark  3.3.1  Note  that  we  have  proven  Corollary  (3.2.4)  as  promised. 

lik^O,  then  by  solving  (3.3.20)  for  V{n)  we  have  that 


for  any  constants  6i  and  62-  Because  the  integral  does  not  have  a  closed  form  solution 
we  are  unable  to  write  the  solution  as  an  explicit  function  of  fx.  Nonetheless,  (3.3.22) 
and  (3.3.23)  completely  characterize  the  EDM  J0rgensen  proper  subtypes  for  which 
second  order  PM  priors  can  be  found. 


(3.3.23) 


3.4    Computer  Simulation 
3.4.1  Method 

Following  Berger  and  Bernardo  (1989)  and  Sun  and  Ye  (1996),  we  investigate  the 
performance  of  assorted  noninformative  priors  for  selected  dispersion  models.  We 
accomplish  this  by  calculating  the  frequentist  coverage  probability  of  the  posterior 
tail  probability  for  /x  and  A.  Let  /Xq  denote  the  posterior  a-quantile  of  fi  given  y  = 
{yi,---,yn)-  That  is  to  say,  F{na)  =  E^^PMy)dii  =  a,  where  PMy)  is  the 
marginal  posterior  distribution  of  //  under  the  prior  tt.  Let 

P(^,A)(«;At)  =  ^(M,A)(At  <  /^a|/^,A)  =  P(^,A)(i^(/^)  <  F(/ia)|/i,A). 

Similarly,  we  can  define  Aq  and  P(^,x)  (a;  A)  to  be  the  posterior  quantile  of  A  and  the 
corresponding  frequentist  coverage  probability,  respectively.  If  P^(/i|y)  (or  P;r('^|y)) 
yields  quantiles  so  that  P(^,A)(a;  A*)  (or  P(^,A)(tt;  ^))  is  close  to  a,  even  if  sample  sizes 
are  small,  then  we  have  evidence  that  the  chosen  prior  performs  well  with  respect  to 
the  probability  matching  criterion. 

We  chose  to  estimate  P(^,a)(q;;  fj)  and  P(^,a)(Q!;  A)  for  the  Fisher-von  Mises  (/i*  = 
7r,A*  =  1),  log  normal  (/i*  =  0,  A*  =  1),  and  inverse  Gaussian  (//*  =  1,A*  =  1/2) 
distributions.  The  parameters  are  starred  to  distinguish  them  from  their  random 
variable  versions.  Sample  sizes  considered  are  n=  2,  5,  and  10;  a  is  set  at  0.05  and 
0.95. 

P(ii,x){<^'i  f^)  and  P(^_A)(<^; '^)  are  estimated  in  the  following  way:  for  each  value  of 
n,  10,000  random  samples  (yi,...,j/„)  are  generated.  Splus  software  was  used  for 
the  random  number  generation.  Next,  P(Yk;  /x*)  (or  P(Yk;  A*))  is  computed  for  each 
of  the  k  =  1, . . . ,  10, 000  sets  of  (yi, . . . ,  y^)-  Then  P(^,a)(Q!;  A*)  is  estimated  by  the 
proportion  of  P(Yit;  /i*)  observed  to  be  <  a.  Essentially  we  are  counting  how  often 
we  observe  /i*  <      (or  A*  <  Aq). 
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The  computing  work  is  accomplished  in  two  stages.  In  the  first  stage  programs 
are  written  to  generate  10,000  random  samples  of  size  n  (2,  5,  or  10)  from  each  of  the 
distributions  considered  in  this  work.  Namely,  the  lognormal,  inverse  Gaussian,  and 
Fisher-von  Mises.  For  each  distribution,  three  collections  of  10,000  random  samples 
are  generated.  The  sample  size  is  varied  for  each  collection,  i.e.,  n=2,  5,  and  10. 

The  second  stage  consists  of  writing  programs  which  will  calculate  F(Yk;  n*)  and 
F(Yk;  A*)  where  /c  =  1, . . . ,  10, 000  and  Yk  denotes  the  kth  random  sample  generated. 
For  each  distribution  two  separate  programs  are  required  for  computing  F(Yk;/x*) 
and  F(Yk;  A*)  respectively. 

For  the  lognormal  density  we  were  able  to  easily  generate  our  random  samples 
using  the  rlnorm  command  in  Splus.  For  the  inverse  Gaussian  distribution  we  used 
the  algorithm  outlined  in  Chapter  4  of  Chhikara  and  Folks  (1989).  The  algorithm 
exploits  the  relationship  between  the  inverse  Gaussian  and  Chi-square  densities.  For 
X  ~  IG{fi,  A),  the  transformed  variable 


A(x  -  f^y 


(3.4.24) 


distributed  as  Xi  has  two  roots,  say  Xi  and  X2,  where 


X  -  ^ 


2A  +  /iF^  -  yJiXfiY^  +  /x2y4 


(3.4.25) 


and 


X  -  ^ 


(3.4.26) 


Michael  et  al.  (1976)  showed  that  a  random  sample  from  an  inverse  Gaussian  dis- 
tribution can  be  recovered  from  a  random  sample  of  a  Chi-square  distribution  by 
selecting  a  root,  Xi  or  X2,  with  a  certain  probability. 
The  algorithm  can  be  summed  in  these  steps: 
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1.  Generate  random  numbers  from  the  Xi  distribution  (easily  done  in  Splus  using 
the  rchisq  command). 

2.  For  each  random  value  in  step  1.,  compute  the  smaller  root  Xi  given  above. 

3.  Perform  a  Bernoulli  trial  with  probability  of  "success"  p  =  ///(/i  +  Xi).  (The 
Bernoulli  trial  can  be  simulated  in  Splus  using  the  uniform  random  number 
generator,  runif.  If  the  value  generated  by  runif  is  less  than  ///(/i  +  Xi),  then 
the  Bernoulli  trial  is  considered  a  "success".) 

4.  If  the  trial  results  in  a  success,  the  root  Xi  is  chosen  for  the  random  observation 
from  the  inverse  Gaussian  distribution;  otherwise  the  larger  root  X2  is  chosen. 

The  generation  of  the  Fisher-von  Mises  data  is  accomplished  using  the  methods 
of  Best  and  Fisher  (1979).  We  obtained  our  random  samples  using  the  FORTRAN 
programs  of  Morrison  (1996). 

The  programs  which  calculate  F(Yk;  n*)  and  F(Yk;  A*)  for  the  lognormal,  inverse 
Gaussian,  and  Fisher-von  Mises  distributions  are  given  in  the  appendix. 

For  the  Fisher-von  Mises  simulation  when  we  are  interested  in  estimating  P(^^a)(»;  A), 
we  can  avoid  intensive  computing  by  applying  the  following  results. 

Theorem  3.4.1  If  a  prior  distribuiton  for  the  Fisher-von  Mises  likelihood  is  a  function 
of  A  only,  then  for  the  Fisher-von  Mises  distribution,  F(^k;  A*)  is  soley  a  function  of 


Proof  of  Theorem  3.4.1  We  prove  this  result  by  recalling  that  in  the  Fisher-von  Mises 
case 


Rk  =  \/(S 


tiCOS^fc,)'  +  (Er=iSin^fc^)'and  A*. 


Jo'  /o^" 5(A) exp  [ARk cos(6'k  -  A*)!  dfidX 


/o"  /o "  5(A)  exp  [ARk  cos(^~k  -  /x)l  dfidX 


/o~5(A)/o(RkA)dA' 


(3.4.27) 


where  g{X)  ^  n{X)/I^{X).  The  prior,  which  is  solely  a  function  of  A,  is  being  denoted 
by  7r(A).  Equation  (3.4.27)  establishes  the  result. 

In  Ught  of  (3.4.1),  we  will  write  F(Rk;  A*)  in  lieu  of  F(^k;  A*). 
Theorem  3.4.2  F(Rk;  A*)  is  monotonic  in  Rk- 

Proof  of  Theorem  3.4.2  We  proceed  by  differentiating  F(Rk;  A*)  with  respect  to  Rk 
and  showing  that  this  derivative  is  strictly  negative.  Since 


roo 

/  p(A)/o(RkA)dA 
Jo  J 


^dF(Ry,;X*) 


|^"p(A)/o(RkA)dA]  5(A)A/i(RkA)dA 


-    J^°°p(A)A/i(RkA)dA]  p(A)/o(RkA)dA 


rX'  /-OO  rX' 

/    5(A)/o(RkA)dA+  /    ^(A)/o(RkA)(iA     /  ^(A)A/i(RkA)ciA 

70  JX'  J  [70 

rX*  roo  ]  r  rX' 

/    9(A)A/i(RkA)dA+  /    5(A)A/i(RkA)dA     /  5(A)/o(RkA)£iA 

70  JX'  J  [70 

roo  rX' 

/    p(A)/o(RkA)dA  /  5(A)A/i(RkA)dA 

7A-  70 
roo  r\* 

/    5(A)A7i(RkA)dA  /  5(A)/o(RkA)rfA 

7A'  70 


roo  rX' 

-/     /    [g{s)Io{Ri,s)g{t)th(Ri,t)-g{s)sh{Rj,s)g{t)Io(Ri,t)]dtds.  (3.4.28) 

7A*  70 

We  verify  that  the  integrand  in  the  RHS  of  (3.4.28)  is  negative  by  showing  that 


^/i(Rk^)  ^  „/i(Rkg) 
/o(Rkt)  ^/o(Rks)' 


(3.4.29) 


Table  3.2.  Simulated  Probabilities  of  Marginal  Posterior  Distributions  of  Under 
Different  Priors  and  Sample  Sizes  with  the  Inverse  Gaussian  Likelihood. 


P(^,A)(0.05;/.) 

P(^,;,)(0.95;//) 

n 

T^jeffreys' 

7rre/(M,A) 

^je  f  freys' 

'^ref{^I.■^X,\) 

7rre/{M,A) 

2 

.1094 

.0916 

.0320 

.0550 

.8385 

.8869 

.8732 

.9395 

5 

.0727 

.0637 

.0344 

.0540 

.9361 

.9476 

.9369 

.9590 

10 

.0580 

.0541 

.0381 

.0497 

.9395 

.9453 

.9365 

.9506 

when  t  <  s  (which  is  our  case).  Let 

By  differentiating  w{z)  and  establishing  w'iz)  >  0,  we  verify  (3.4.29).  Using  the 
quotient  rule  and  (2.2.33)  we  have  that 

l'o{z)w'{z)  =  zf,{z)  -  zlUz)  +  \lo{z)h{z).  (3.4.31) 

From  Soni  (1965)  we  have  that 

Ii{z)  <  Io{z),  (3.4.32) 

thus  w'{z)  >  0  and  hence  F(Rk;  A*)  is  monotonia  in  R^.     ;  .  ■  ' 

The  computing  burden  is  now  vastly  reduced  since  by  ordering  our  Rjc's  we  can 
find  that  value,  call  it  R.os,  such  that  F(R.o5;  A*)  <  .05.  We  may  also  find  the  value, 
call  it  R.95,  such  that  F(R.95;  A*)  <  .95.  We  are  able  then  to  estimate  P(^_a)(q!;  A)  by 
counting  the  number  of  Rk's  which  are  less  than  R.05  and  the  number  of  Rk's  which 
are  greater  than  R.95. 

3.4.2  Results 
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Table  3.3.  Simulated  Probabilities  of  Marginal  Posterior  Distributions  of  n  under 
Different  Priors  and  Sample  Sizes  with  the  Log  Normal  Likelihood. 


F(^,A)(0.05;/i) 

P(^,A)(0.95;/i) 

n 

^Jeffreys' 

7rre/(M,A) 

^jef  freys' 

7rre/(M,A) 

2 

.1404 

.0499 

.8563 

.9487 

5 

.0773 

.0533 

.9285 

.9517 

10 

.0615 

.0502 

.9226 

.9490 

Table  3.4.  Simulated  Probabilities  of  Marginal  Posterior  Distributions  of  /x  Under 
Different  Priors  and  Sample  Sizes  with  the  Fisher-von  Mises  Likelihood. 


P(^,a)(0.05;m) 

P(^,A)(0.95;//) 

n 

Jeffreys' 

T^BK 

7rre/(M,A) 

Jeffreys' 

T^BK 

7rre/(/z,A) 

5 

.0605 

.0797 

.0281 

.0518 

.9489 

.9291 

.9756 

.9570 

10 

.0564 

.0698 

.0323 

.0538 

.9421 

.9334 

.9668 

.9475 

Tables  3.2-3.4  provide  the  simulated  tail  probabilities  of  posterior  distributions  of 
/i  under  different  priors,  likelihoods  and  sample  sizes  when  the  frequentist  tail  prob- 
abilities are  set  at  .05  and  .95.  For  the  inverse  Gaussian  distribution,  four  different 
priors  are  considered:  it  Jeffreys'  (Jeffreys'),  'jrref{n,x)  (reference  prior  under  rectangu- 
lar compacts  on  (//,A)),  TTref(fi^x,x)  (reference  prior  under  rectangular  compacts  on 
(/i^A,  A)),  and  ttbb  (the  prior  of  Banerjee  and  Bhattacharyya).  With  the  exception  of 
tvbb,  these  priors  are  all  FOPMC.  This  is  supported  in  Table  1.  where  the  posterior 
tail  probabilities  under  ttbb  are  very  much  off  their  frequentist  counterparts  even  for 
n=10.  The  performance  of  TTref{n,x)  is  clearly  the  best,  and  is  very  much  on  target 
even  for  a  sample  size  as  small  as  2.  The  other  priors  Jeffreys'  and  'Kref{tJ.'^x,x)  do 
not  perform  well  when  n=2,  but  improve  as  the  sample  size  increases.  Thus,  for  the 
inverse  Gaussian  example,  when  /x  is  the  parameter  of  interest,  TTref{n,x)  seems  to  be 
the  appropriate  choice,  and  this  is  consistent  with  the  findings  of  Liseo  (1993)  and 
Sun  and  Ye  (1996). 


Table  3.5.  Simulated  Probabilities  of  Marginal  Posterior  Distributions  for  A  under 
Different  Priors  and  Sample  Sizes  with  the  Lognormal  Likelihood. 


P(,,A)(0.05;A) 

P(M,A)(0.95;A) 

n 

^jef  freys' 

^jef  freys' 

2 

.2532 

.0498 

.9844 

.9482 

5 

.1193 

.0537 

.9739 

.9500 

10 

.0815 

.0487 

.9686 

.9491 

Next,  for  the  lognormal  distribution,  the  two  priors  considered  are  i^jeffreys'  (Jef- 
freys'), and  'Kref{ti,\)  (reference  prior  under  rectangular  compacts  on  (//,  A)).  The  prior 
7i're/(/i,A)  IS  the  uniquc  second  order  optimal  prior,  but  i^jeffreys'  is  not  even  first  order 
optimal.  This  is  clearly  reflected  in  the  figures  of  Table  2. 

For,  for  the  Fisher-von  Mises  distribution,  a  similar  simulation  study  is  performed. 
The  competing  priors  in  this  case  are  iTjeffreys'  (Jeffreys'),  'Kref{n,x)  (reference  prior 
under  rectangular  compacts  on  (/i.  A)),  -kbk  (the  prior  of  Bagchi  and  Kadane),  and 
TTw  (the  unique  second  order  optimal  prior).  Clearly  ttw  is  typically  much  closer  to 
the  target  than  all  its  competitors,  and  the  simulations  once  again  bear  out  the  theory 
that  we  have  already  developed. 

Table  3.5  provides  the  simulated  tail  probabilities  of  posterior  distributions  of  A 
under  Jeffreys'  and  reference  priors  for  the  lognormal  likelihood.  Sample  sizes  were 
taken  to  be  n  =  2,  n  =  5,  and  n  =  10.  The  frequentist  tail  probabilities  are  set  at 
.05  and  .95.  The  results  in  Table  3.5  illustrate  that  the  reference  prior  outperforms 
Jeffreys'  prior  decisively.  In  this  case  the  reference  prior  is  second  order  optimal  while 
Jeffreys'  prior  is  not  even  first  order  optimal. 


6d 

Table  3.6.  Ball  Bearing  Data 
17.88     28.92    33.00    41.52    42.12  45.60 
51.84     51.96    54.12    55.56    67.80  68.64 
68.88     84.12    93.12    98.64    105.12  105.84 
128.04  173.40 

3.5    Ball  bearing  data 

In  1828  Robert  Brown,  one  of  England's  greatest  botanists,  wrote  a  pamphlet  de- 
scribing the  swimming,  dancing  motion  of  pollen  particles  when  these  particles  were 
immersed  in  water.  This  physical  phenomenon  was  to  become  known  as  Brownian 
motion.  Brownian  motion  is  mathematically  modeled  by  what  is  referred  to  as  a 
Wiener  process.  Since  the  IG  is  the  first  passage  time  distribution  for  the  Wiener 
process,  it  is  particularly  appropriate  for  modeling  failure  times.  Chhikara  and  Folks 
(1989)  revisited  test  data  on  the  endurance  of  deep  groove  ballbearings  first  presented 
in  Lieblein  and  Zelen  (1956).  The  data  consist  of  the  number  of  million  revolutions 
before  failure  for  each  of  23  ball  bearings  used  in  the  test  and  are  given  below:  The 
application  of  the  IG  distribution  does  not  require  the  specification  of  the  underlying 
Wiener  process  and  none  is  given  in  the  revisitation.  Rather,  the  use  of  the  IG  dis- 
tribution is  justified  by  goodness  of  fit  considerations.  Chhikara  and  Folks  calculated 
the  observed  value  of  the  Kolmogorov-Smirnov  statistic  as  0.0994,  indicating  a  good 
fit  of  the  ball  bearing  data  to  the  IG  model. 

Chhikara  and  Folks  report  the  MLE  of  //,  the  mean  (in  number  of  revolutions 
before  failure),  as  72.22,  and  give  a  95%  confidence  interval  for  fi  as  (57.9,  95.91). 


48.48 
68.64 
127.92 
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We  will  now  proceed  with  a  Bayesian  analysis  of  the  ball  bearing  data.  The  graphs 
of  Ujeffreys',  ^ref{n,x),  ^ref(n^\,\)i  ^nd  Ubk,  the  marginal  posteriors  of  //  under  the 
noninformative  priors  discussed  in  Section  2.2.4,  are  given  in  Figure  3.1. 


50  60  70  80  90  100 


Under  Various  Priors 
Figure  3.1.  marginal  posteriors  of  fi 


Although  these  posterior  densities  possess  no  moments,  we  can  report  95%  credible 
sets  for  fi,  as  well  as  the  posterior  median  and  mode. 

3.6   Roulette  Wheel  Data 

The  characterizations  of  the  Fisher-von  Mises  distribution  make  it  highly  desirable 
for  modeling  angular  data.  These  include  the  maximum  likelihood  and  maximum 
entropy  characterizations.  Von  Mises  proved  the  former  by  showing  that  for  angular 
data  with  location,  the  sample  mean  is  the  MLE  of  fx  if  and  only  if  the  rv  follows  the 
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Prior 

95%  credible  sets  for  ^ 

posterior  median 

posterior  mode 

'^jeffreys'  CX  ^~^/^A~^/^ 

(58.54,  96.26) 

72.75 

70.87 

(58.26,  97.14) 

72.74 

70.81 

(58.47,  96.97) 

72.76 

70.84 

(57.91,  95.95) 

72.23 

70.36 

Fisher-von  Mises  distribution.  Mardia  (1972,  p. 65)  gives  the  proof  that  the  Fisher-von 
Mises  distribution  has  maximum  entropy  amongst  distributions  on  the  circle. 

Mardia  (1972)  gives  an  example  of  a  real  data  set  obtained  by  allowing  a  roulette 
wheel  to  revolve  and  its  stopping  positions  noted.  Of  interest  is  whether  or  not 
the  wheel  has  a  preferred  stopping  location.  The  measurements  in  9  trials  were 
43°,  45°,  52°,  61°,  75°,  88°,  88°,  279°,  357°.  The  data  is  graphed  in  Figure  3.2.  The  MLE 
estimate  of  ji  is  reported  as  51.0°.  Following  derivations  given  in  Mardia  (1972,  p. 145), 
a  95%  uniformly  most  accurate  unbiased  confidence  interval  for  /x  was  calculated  to 
be  (15°,  87°). 

We  proceed  with  a  Bayesian  analysis  of  the  roulette  data.  The  graphs  of  nj(/i|y), 
nre/(^,A)(A*|y),  '^wiisoniiAy)^  and  nB/c(A*|y),  the  marginal  posterior  distributions  of  )lx 
under  the  noninformative  priors  discussed  in  Section  2.2.5  are  given  in  Figure  3.3. 

For  each  marginal  posterior,  95%  credible  sets  for  ^,  as  well  as  posterior  mean 
and  standard  error  are  given  in  Table  3.7. 


71 


Roulette  Wheel  Data 

Figure  3.2.  Roulette  Wheel  Data 

The  important  point  to  note  here  is  that  the  95%  credible  set  for  ^  under  TTjuUson 
matches  the  UMAU  confidence  interval  best.  This  result  conforms  with  the  theory 
already  developed. 
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Table  3.7.  Credible  sets  for  various  noninformative  priors 


Prior 

95%  credible  set  for  n 

posterior 
mean 

standard 
error 

T^jeffreys'  OC  [\A{\)  (1  -  X'' A{\)  -  A^{X)f'^ 

(20.6°,  81.9°) 

52.6° 

3.4° 

7r,e/(M)0^[l-A-M(A)-A2(A)]'/' 

(11°,  90°) 

54.8° 

4.6° 

T^^ilsonO^\{l-\-'A{\)-A^{\)) 

(15.5°,  87.7°) 

53.1° 

3.8° 

TTbk  OC  X 

(19.8°,  83.1°) 

52.0° 

2.9° 

CHAPTER  4 

ESTABLISHING  PROPRIETY  OF  POSTERIORS  IN  THE  REGRESSION  SETTING 


-J 


4.1    Formulation  of  the  Dispersion  Regression  Model 


Suppose  Yi,. .  .,Yn  are  independent  random  variables  where  Yi  has  a  density  be- 
longing to  the  j0rgensen-proper  dispersion  model  family 


i  =  1, . . .  ,n.  Further  assume  that  is  related  to  a  set  of  explanatory  variables, 
Xj  =  {xii, . . . ,  Xip)  via  the  relation 


where  /?  =  . . . ,  Pp)'  is  a  p  x  1  vector  of  regression  coefficients  and  g{-),  the  link 
function,  is  a  postulated  monotonic  differentiable  function.  The  model  in  (4.1.1) 
includes  the  GLM  as  a  special  case. 

We  are  interested  in  formulating  a  Bayesian  analysis  of  this  model.  In  particular  we 
wish  to  make  inference  regarding  . . . ,  /?p,  A).  The  cases  which  will  be  considered 
are: 

(i)  Pi,. . .  ,l3p  are  parameters  of  interest  and  A  is  considered  a  nuisance  parameter. 

(ii)  A  is  the  parameter  of  interest  and  Pi,...,Pp  are  considered  a  nuisance  param- 
eters. 

(iii)  pi,. .  .,Pp  and  A  are  all  parameters  of  interest. 

We  seek  suitable  noninformative  priors  for  these  cases.  As  a  first  step  we  calculate 
the  information  matrix  for  P  and  A. 


fivM,  A)  =  a{\)b{yi)  exp  [\t{yi,  Hi)] , 


(4.1.1) 


Hi  =  i/(x'i/5) 


(4.1.2) 
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Let  /i  =  (/xi, . . . ,  /In).  Then  the  log  likelihood  for  (//,  A)  is  given  by 


£{H,  \)  =  n  log  a(A)  +  ^  log  6(2/,)  +  A  J]      ,  Hi). 


i=l 


i-1 


(4.1.3) 


Replacing  /i^  by  ^(x-/?)  we  have 


e{fi,X)  =  nloga(A)  +  £log%)  +  A  p(x;/?)).  (4.1.4) 


i=l 


1=1 


Thus  for  j  =  1, . . .  ,p  and  A;  =  1, . . .  ,p  we  have 


^^=A|:-t(,.,(x,/?))-, 


(4.1.5) 


and 


(4.1.6) 


(4.1.7) 


so  that 


E 


d'ejf^,  A) 


i=l 


—t[yz,  gi^iP))  


(4.1.8) 


The  second  term  in  (4.1.8)  vanishes  since 


E 


0. 


(4.1.9) 


Compactly  we  can  write 


E 


=  -AX'A^VX, 


(4.1.10) 
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where  X  is  the  nx  p  matrix  of  covariates  with  ith  row  equal  to  x'j,  A  =  Diag{^^^) 


where  th  =  x'j/?,  and  V  =  Diag  [e  ^(x';^))]) 


Since 


and 


we  have  that 


and 


^TT  log  A  +  5;  t{yi,  p  X 


dx 


d'ejfi,  A) 
dPjdX 


=  0 


A) 


aA2 


-n^loga(A). 


The  information  matrix  is  then  calculated  and  can  be  written  as 


AX'A^VX  0 
0 


-n^\oga{X) 


(4.1.11) 


(4.1.12) 


(4.1.13) 


(4.1.14) 


(4.1.15) 


We  are  now  in  position  to  apply  a  theorem  due  to  Datta  and  Ghosh  (1995b)  which 
gives  the  reference  prior  for  case  (i)  and  case  (ii)  as 


7I're/(/3,A) 


X'A^VX 


(4.1.16) 


(assuming  rectangular  compactification  of  the  parameter  space  is  possible).  In  the 
case  where  all  parameters  are  of  interest  then  the  reference  prior  is  the  same  as 
Jeffreys'  prior  and  is  given  by 


T^jejfreys'  Od  A^^^ 


X'A^VX 


1/2 

[  dX^ 


1/2 


loga(A) 


(4.1.17) 
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4.2    An  Extension  of  the  Ibrahim-Laud  Result 


In  order  to  insure  that  the  posterior  density  will  provide  valid  inference,  we  must 
verify  its  propriety.  Following  Ibrahim  and  Laud  (1991),  we  give  sufficient  conditions 
for  the  propriety  of  the  posterior  under  Jeffreys'  {tt Jeffreys')  and  the  reference  prior 
(7i"re/(/i,A))  for  the  case  when  A  is  known  and  when  A  is  unknown. 

Theorem  4.2.1  Suppose  the  likelihood  function  of  (/?,  A)  is  of  the  form  of  a  J0rgensen 
proper  dispersion  model,  i.e. 


L(/?,A)  =  a"(A) 


n  biVr) 


,t=i 


exp 


(4.2.18) 


Suppose  also  that  A  is  known.  Assume  that  X  is  of  full  rank  and  that  the  likelihood 
of  /3  is  bounded  above.  Then  a  sufficient  condition  for  the  existence  of  the  posterior 
density  under  Jeffreys'  prior  for  any  member  of  J0rgensen's  dispersion  model  family 
is  that  the  integral 


f 

J  —i 


1/2 


exp  [Xt{y,r)]  dr. 


(4.2.19) 


is  finite. 


Proof  of  Theorem  4.2.1  The  posterior  under  Jeffreys'  prior  in  the  case  where  A  is 
considered  known  is  proportional  to 


X'A^VX 


1/2 


X  exp 


1=1 


(4.2.20) 


Applying  the  Cauchy-Binet  theorem  from  Linear  Algebra,  we  have  that 


X'A^VX 


1/2 


1/2 


T  \j=l 


(4.2.21) 
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where  T  =  {(zi,...,ip)  :  1  <  ii  <  . . .  <  <  n},  x\.  is  the  z^th  row  of  X, 
c(Xtj, . . .  ,Xtp)  =  and  (x^j, . . .  ,Xip)  is  a  p  x  p  matrix  with  jth  column 

Xi-  {j  =  The  diagonal  elements  of  V  and       are  denoted  by  Vi-  and  6f. 

respectively. 

The  application  of  the  Cauchy-Binet  theorem  will  allow  us  to  dispense  of  cumber- 
some determinant  notation  and  allow  us  to  give  a  sufficient  condition  for  propriety  of 
the  posterior  which  is  free  of  the  design  matrix  X. 

Now 


>  exp 

1=1 


<5:c"^(xi„...,ii.) 


1/2 


exp 


1=1 


dp 
(4.2.22) 

(4.2.23) 


by  the  Cs  inequality.  To  show  that  the  posterior  of  P  is  proper  it  is  sufficient  to  show 
that  for  every  element  (ii, . . . ,  ip)  of  the  index  set,  T,  the  pth  dimensional  integral  in 
(4.2.23)  converges.  Without  loss  of  generality  let  ii  =  1, . . . ,  =  p  in  the  remainder 
of  the  proof.  We  also  assume  that  is  nonsingular,  since  if  it  is  singular  then 
c{xi,...,Xp)  =  0. 

By  assuming  that  the  likelihood  of  P  is  bounded  above  we  have  that  ' 


exp 


Xj:t{y„g{x',P)) 


1=1 


<  exp[M]  exp 


AE%>5(4/^)) 


(4.2.24) 


for  some  positive  constant  M.  We  then  can  establish  that  the  posterior  will  exist  if 
we  can  show  that 


1/2 

n  VjS]  I  exp 


2 


X^t{y„g{x'jP)) 


dp 


(4.2.25) 
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is  finite.  We  now  make  the  linear  transformation  u  =  X^fS  (so  Uj  =  x'j^).  The 
Jacobian  of  this  transformation  is        so  that  (4.2.25)  is  proportional  to 


n  r  ^"'^ 


exp 


Letting  rj  =  g{uj)  we  have 


(4.2.26) 


P  rex 

n/ 


1/2 


exp  [At(yj,rj)]drj 


(4.2.27) 


We  then  see  that  (4.2.26)  is  a  product  of  p  one-dimensional  integrals  and  is  finite 
if  for  each  j  =  1, . . .  ,p,  (4.2.19)  is  finite. 

Next  we  consider  the  case  when  both  P  and  A  are  unknown.  The  following  theorem 
provides  sufficient  conditions  for  the  propriety  of  posteriors  under  a  suitable  class  of 
priors  including  both  the  Jeffreys'  and  reference  priors. 

Theorem  4.2.2  Suppose  the  likelihood  function  of  (/?,  A)  is  of  the  form  of  a  J0rgensen 
proper  dispersion  model,  i.e. 


L(/3,A)  =  a"(A) 


n  biVi) 


.1=1 


exp 


(4.2.28) 


Assume  that  X  is  of  full  rank.  Then  a  sufficient  condition  for  the  existence  of  the 
posterior  density  under  the  prior 


7r(/3,  A)  oc  \° 


X'AVX 


1/2 


loga(A) 


1/2 


(4.2.29) 


for  any  member  of  J0rgensen's  dispersion  model  family  is  that  the  integral 


dpdX 


(4.2.30) 
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is  finite. 


Remark  4.2.1  For  Jeffreys'  prior  a  =  1/2  and  for  the  reference  prior  a  =  0. 


Proof  of  Theorem  4.2.2  The  posterior  is  proportional  to 


A" 


X'A^VX 


I-— loga(A)j  xa"(A)exp 


1=1 


(4.2.31) 


Applying  the  Cauchy-Binet  theorem  from  Linear  Algebra,  we  have  that 


1/2 


(iA2 


(4.2.32) 


where  T  =  {{ii,...,ip)  :  1  <  ii  <  . . .  <  ip  <  n},  x[.  is  the  i^th  row  of  X, 
c{xi^,. . .  ,Xip)  =  and  XI  —  {xi^, . . .  ,Xi^)  is  a.  p  x  p  matrix  with  jth  column 

x^.  [j  =  1, . . .  ,p).  Recall  that  the  diagonal  elements  of  V  and  are  denoted  by  Vi. 
and  6f.  respectively. 

Again,  the  application  of  the  Cauchy-Binet  theorem  will  allow  us  to  dispense  of 
cumbersome  determinant  notation.  Now 


r  [  7T{^,X\y)dpdX 
Jo  Jw 


1/2 


^  /f /^|Ec(^in-  -^^.p)  (n^^i/?)  I  A"a"(A)exp 


xEtiy^^9{x'M 


t=l 


^  ^  {Xii )  •  ■  •  J  ^ip) 

T 


(\  ^/■^ 
n^,/M     A"a"(A)exp  xf:t{y,,g{x', 


dpdX 
(4.2.33) 

d/3dX 


(4.2.34) 

by  the  Cs  inequality.  To  show  that  the  posterior  of  (/?,  A)  is  proper  it  is  sufficient  to 
show  that  for  every  element  (ii, . . . ,  ip)  of  the  index  set,  T,  the  (p  +  l)th  dimensional 
integral  in  (4.2.34)  converges. 


80 


4.3    Establishing  Propriety  for  the  Power  Distribution 


Suppose  Yi, . . . ,  y„  are  independent  random  variables,  where  Yi  has  a  power  den- 
sity of  the  form 

AT 


.-Myi-tiil" 


5=1/7  >1     -oo</i<oo    -oo<j/<oo    A>0,  (4.3.35) 

where  z  =  1, . . . ,  n.  Further  assume  that  /Xj  is  related  to  a  set  of  explanatory  variables, 
=  {xii, . . .  ,Xip)  via  the  relation  f 

tii  =  g{x[P)  (4.3.36) 

where  (3  =  {Pi, . . .  ,Pp)'  is  a  p  x  1  vector  of  regression  coefficients  and  g{-),  the  link 
function,  is  a  postulated  monotonic  differentiable  function. 
From  (4.1.16)  we  have  that 


Tr{p,  A)  oc  A 


l+Q 


X'A^VX 


1/2 


(4.3.37) 


In  order  to  establish  that  the  posterior  is  proper,  in  view  of  Theorem  4.2.2,  we 
must  show  that 


Jo  r"(A)  Jiv' 


lo  r"(A) 


exp 


-XY,\yi-g{x'M' 


dp 


dX  (4.3.38) 


is  finite,  where  a  =  1/2  for  the  posterior  under  Jeffreys'  prior  and  a  =  0  for  the 
posterior  under  the  reference  prior.  We  assume  n  >  p.  Note  that 


1=1 


dp 


j=p+l 


exp 


dp 


dp. 


(4.3.39) 
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By  applying  the  Cauchy-Binet  theorem  we  realize  that  if  \X^\  =  0,  then  c{ii, . . .  ,ip) 
0  and  this  term  makes  no  contribution.  Otherwise  7^  0  and  there  is  a  one-to-one 
transformation  from  /?  =  {Pi,...,^p)  to  {x'^^^i,. . . ,  x[^Pp).  We  assume  without  loss 
of  generality  that  i  =  1,. . .  ,p  are  the  members  of  the  index  set  T  defined  in  the  proof 
of  Theorem  4.2.2.  After  making  a  change  of  variables  (4.3.39)  can  be  written  as 


n  /    v^^^j  exp  [-A  1%  -  g{uj)\^]  duj. 


(4.3.40) 


By  letting  rj  =  yj  —  g{uj)  and  making  another  change  of  variables  we  note  that 


/  vy^Sj  exp\-X\yj  —  g{uj)f  duj  =  f  ■uj'^^exp  —  A  |rj|''l  dr^  (4.3.41) 
J-00  ^  J— 00 


We  need  the  following  lemma  before  we  proceed. 


Lemma  4.3.1  For  5  >  1 


I    exp  [-A|; 


dz  <2 


^  I  exp[-A] 


A 


(4.3.42) 


Proof  of  Lemma  4.3.1 


y"*"  exp  [-Al^l'']       =  2j\xp[-Xz^ 


dz 


=   2  J exp  -Az*]  dz  +  exp 


-Xz' 


dz 


\  ^  exp[-A] 
A 


(4.3.43) 


since  z^  >  z  for  z  >  1. 
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For  A  >  1,  Lemma  4.3.1  gives 


r  exp  [-A|2|*l  dz<2 


1  + 


exp  [—A] 


<  4 


(4.3.44) 


and  for  0  <  A  <  1,  Lemma  4.3.1  gives 


/    exp  [-A|; 


dz<2 


1  + 


exp  [—A] 
A 


4 

<  -. 
-  A 


(4.3.45) 


We  may  then  use  (4.3.44)  and  (4.3.45)  to  write  that  (4.3.38)  is  less  than  a  quantity 
proportional  to 

rl  AP  \n'l-a  foo  \n7-a 

(4.3.46) 


L 


1  AP  \ni-OL  rco  \n7-a 


APT"  (A) 


r"(A) 


Using  the  relation  r(A+l)  =  Ar(A)  we  see  that  the  first  integral  in  (4.3.46)  is  finite 
for  n7  —  a  +  n  —  p  >  —1.  Using  Stirling's  formula  for  large  A,  r(A)  ^  \/27rA^""^/^e"^, 
we  observe  that  the  integrand  of  the  second  integral  in  (4.3.46)  behaves  as 


^n-f—a+n/2 ^nX       ^717— Q+n/2gnA(l— log  A) 


(27r)"/2A"^ 


(27r)"/2 


(4.3.47) 


which  is  integrable  in  the  range  (l,oo).  Hence  (4.3.46)  is  finite  and  the  posterior  in 
the  general  regression  setting  is  established  to  be  proper  for  the  power  density. 

Remark  4.3.1  Note  that  when  6  =  2  the  normal  density  is  obtained. 

4.4    Establishing  Propriety  for  the  Inverse  Gaussian  Distribution 


Suppose  Yi, . . .  ,Yn  are  independently  distributed  with  an  inverse  Gaussian  density 
of  the  form 


fiyi\^^i,Xi)  =  J^Vi  ^^^exp 


-A,: 


(4.4.48) 
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where  ?/  >  0,  A  >  0,  and  Hi  >  0.  Further  we  assume  that 


Hi  =  /3xi 


(4.4.49) 


and  that 


Aj  —  AXj 


(4.4.50) 


where  ^  >  0  and  Xi>  0.  This  represents  a  regression  model  through  the  origin.  From 
results  in  Section  2.2.4,  we  have  that 


and  that 


We  then  have  that 


'^jeffreys'  ^ 


(4.4.51) 


(4.4.52) 


n(/?,A|y)  a/3-^/2;^"A"/2exp 


(4.4.53) 


where  a  =  —1/2  for  the  posterior  under  Jeffreys'  prior  and  a  =  —1  for  the  posterior 
under  the  reference  prior.  Since  Il{/3,X\y)  is  a  gamma  density  in  A,  we  have  for 
n/2  +  a  +  l  >  0 

^-3/2 


U{P\y)  cx 


n/2+a+l  ' 


(4.4.54) 


as  long  as  yi  ^  (5xi  for  all  i.  As  in  the  proof  in  Section  2.2.4,  we  let  t  =  ^.  k  change 
of  variables  gives 


ru{/3\y)dpoc  r 

Jo  Jo 


(-1/2 


n/2+Q+l 


dt. 


(4.4.55) 
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Let  g{t)  =  Y17=i  -  ^"/^"'"""'"^^  Pqj.  g^jjy  value  of  to  ,  g{t)  is  continuous, 
therefore  bounded  on  [0,to],  say  g{t)  <  M  on  t  e  [0,  to].  Then 

r°  t-^''^g{t)dt  <       r^l^Mdt  =  2to^^^M.  (4.4.56) 
JO  Jo 

Since 

ri/2^(t)~r("+'"+^/2),  (4.4.57) 

for  sufficiently  large  t,  say  t  >  to,  we  have 

^Jeffreys'  {P\y)    a    /    t-'/'g{t)dt  +  /  t-'/'g{t)dt 

Jo  J  to 

fOO 


r°°  A; 

:dt 


i+2a+5/2 


^^-{Ti+2a+3/2) 


=   2to'/^M  +  — 5—  —  (4.4.58) 

which  is  finite.  The  propriety  of  the  posterior  under  Jeffreys'  prior  and  the  reference 
prior  is  thus  established. 
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4.5    Application  of  Regression  Through  the  Origin 

Whitmore  (1986)  presents  a  ratio  estimation  model  based  on  the  inverse  Gaussian 
distribution.  Ratio  estimation  typically  is  based  on  a  regression  model  in  which  the 
mean  of  the  ith  response  Yi  is  proportional  to  the  level  Xi  of  an  explanatory  variable, 
i.e.  E[Yi\  =  Pxi  for  some  constant  p.  Generally,  the  variance  of  Yi  is  also  a  function  of 
Xi.  As  an  illustration  of  ratio  estimation,  consider  the  following  real  application  given 
by  Whitmore  (1986).  A  market  survey  organization  produces  and  sells  a  report  which 
gives  projections  (estimates)  of  annual  dollar  sales  for  all  products  of  all  companies  in 
a  particular  consumer-product  industry  (  a  total  of  A'^  products).  The  projections  are 
made  by  monitoring  sales  amounts  in  a  panel  of  retail  sales  outlets.  Any  company 
in  the  industry  which  purchases  the  survey  report  is  able  to  compare  the  actual 
sales  amounts  for  its  n  products  (F,;  z  =  1, . . . ,  n)  with  the  projected  amounts  in  the 
report  {xi;i  =  l,...,n),  but  must  infer  the  actual  sales  amounts  for  competitors' 
products  {Yi;i  =  n  +  1, . . . ,  N)  from  the  corresponding  projected  amounts  appearing 
in  the  report  {xi;i  =  n  +  1,...,N).  A  good  estimate  for  /3  enables  a  company 
to  better  infer  the  actual  sales  amounts  of  their  competitors.  The  inverse  Gaussian 
distribution  is  highly  appropriate  for  right-skewed  positive  valued  responses.  The  data 
in  this  example  are  well  modeled  by  the  inverse  Gaussian  distribution  which  allows 
for  varying  degrees  of  skewness.  As  the  dollar  amount  of  the  projections  increase, 
the  skewness  in  actual  sales  will  also  increase.  Table  4.1  contains  the  projected  and 
actual  sales  amounts  for  20  products  of  one  company,  arrayed  by  magnitude  of  the 
projected  sales  amount. 

The  propriety  results  of  the  last  section  allow  us  to  proceed  with  a  Bayesian 
analysis  of  the  data.  Suppose  Fi, . . . ,  F„  are  independently  distributed  with  an  inverse 


■J 
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Table  4.1.  Projected  and  Actual  Consumer  Product  Sales  in  Hundred  of  Thousands 
of  Dollars 


Projected 

Actual 

5959 

5673 

3534 

3659 

2641 

2565 

1965 

2182 

1738 

1839 

1182 

1236 

667 

918 

613 

902 

610 

756 

549 

500 

527 

487 

353 

463 

331 

225 

290 

257 

253 

311 

193 

212 

156 

166 

133 

123 

122 

198 

Gaussian  density  of  the  form 


-A. 


(4.5.59) 


where  y  >  0,  X  >  0,  and  //,  >  0.  Further  we  assume  that 


fii  =  I3xi 


(4.5.60) 


and  that 

Xi  =  Xxf  (4.5.61) 

where  /3  >  0  and  Xi  >  0.  We  computed  posterior  modes  of  /3  and  A  under  Jeffreys' 
prior  and  the  reference  prior.  We  also  calculated  credible  sets  for  /?  under  Jeffreys' 
prior  and  the  reference  prior.  Table  4.2  summarizes  the  calculations  and  gives  the  fre- 
quentist  results  reported  in  Whitmore  (1986).  There  is  no  appreciable  difference  here 
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Table  4.2.  Inference  Results 


Jeffreys' 

Reference 

Whitmore 

Posterior  mode  /? 

1.0325 

1.0333 

MLE  /3  1.0383 

Posterior  mode  A 

.0553 

.0524 

MLE  A  .05838 

95%  Credible  set  for  p 

(.960,1.104) 

(.945,1-104) 

95%  Confidence  Interval  (.979,1.106) 
Interval  for  /? 

between  Jeffreys'  prior  and  the  reference  prior.  The  posterior  modes  of  f3  and  A  under 
both  Jeffreys'  prior  and  the  reference  prior  are  very  close  to  the  point  estimates  of  f3 
and  A  given  by  Whitmore.  The  95%  credible  set  for  /?  under  both  Jeffreys'  prior  and 
the  reference  prior  are  similar  to  the  95%  confidence  interval  for  P  which  Whitmore 
reports.  From  Table  3.1  we  note  that  both  Jeffreys'  prior  and  the  reference  prior  are 
first  order  optimal  for  (3  for  the  inverse  Gaussian  distribution.  Either  noninformative 
prior  is  an  acceptable  choice  for  this  example. 


CHAPTER  5 
SUMMARY  AND  FUTURE  RESEARCH 

5.1  Summary 

"    -  ( 

The  J0rgensen  dispersion  model  class  has  many  desirable  features  which  makes  it 
an  excellent  class  to  develop  a  general  theory  around.  The  J0rgensen  dispersion  model 
class  includes  not  only  all  the  densities  which  can  be  incorporated  into  a  generalized 
linear  model,  but  includes  the  important  the  Fisher-von  Mises,  Student's  t,  and  power 
distributions.  We  have  shown  that  the  Bar-Lev  and  Reiser  two-parameter  exponential 
family  forms  a  subtype  of  J0rgensen  dispersion  models.  Since  the  information  matrix 
of  J0rgensen  dispersion  models  can  be  written  in  block  diagonal  form,  we  can  easily 
formulate  reference  and  probability  matching  priors  from  the  results  given  in  this 
work. 

Posteriors  derived  from  either  the  reference  prior  or  Jeffreys'  prior  with  J0rgensen 
dispersion  model  likelihoods  were  shown  to  be  legitimate  for  selected  distributions. 
The  distributions  selected  were  the  normal,  lognormal,  gamma,  inverse  Gaussian, 
Fisher-von  Mises,  Student's  t,  and  power  distributions. 

Probability  matching  priors  offer  a  powerful  method  for  approximating  frequentist 
confidence  intervals.  The  probability  matching  criterion  also  gives  a  Bayesian  a  rea- 
sonable basis  for  the  selection  of  a  noninformative  prior.  We  were  able  to  formulate 
probabiHty  matching  priors  for  J0rgensen  dispersion  models.  Computer  simulations 
demonstrated  that  even  for  small  samples,  credible  sets  obtained  using  probability 
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matching  priors  more  closely  approximated  confidence  intervals  than  credible  sets  ob- 
tained with  other  noninformative  priors.  Inference  made  for  two  real  data  sets  also 
demonstrated  the  matching  property. 

Regression  based  on  J0rgensen  dispersion  models  provides  a  valuable  inference 
structure  for  analyzing  real  life  data.  We  were  able  to  give  the  form  of  the  refer- 
ence prior  in  this  setting.  Sufficient  conditions  were  given  to  ensure  propriety  of  the 
posterior  under  Jeffreys'  or  the  reference  prior. 
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5.2    Ideas  for  Future  Research 

Are  there  any  densities  which  are  J0rgensen  dispersion  models  but  not 
J0rgensen  proper  dispersion  models? 

As  yet  we  do  not  know  the  answer.  In  a  recent  conversation  with  J0rgensen,  we 
have  learned  that  J0rgensen  himself  does  not  know  the  answer  to  this  question. 

Are  some  probability  matching  priors  distinct  from  reference  priors? 

We  found  second  order  probability  matching  priors  which  were  distinct  from  ref- 
erence priors  obtained  from  a  certain  compactification  of  the  parameter  space.  Are 
these  second  order  probability  matching  priors  which  are  reference  priors  under  a 
different  compactification? 

Extend  Bayesian  Inference  for  directional  data. 

Morrison's  (1995)  work  on  the  offset  normal  distribution  opens  the  door  for  a 
Bayesian  treatment.  See  Mardia  (1979,  p.  52)  for  an  introduction  to  the  offset 
normal  distribution.  Morrison  has  shown  that  the  oflfset  normal  distribution  lends 
itself  nicely  to  the  modeling  of  directional  data  with  covariates.  Formulations  for 
Jeffreys',  reference,  and  probability  matching  priors  could  be  found  for  the  oflfset 
normal  distribution.  Propriety  results  for  the  posterior  under  these  noninformative 
priors  could  be  established.  Also  of  interest  would  be  the  specification  of  informative 
priors  for  the  offset  normal. 

Formulate  HPD  matching  priors  for  J0rgensen  dispersion  models. 

Recent  work  has  been  done  by  Mukerjee  and  J.K.  Ghosh  (1994)  giving  conditions 
which  characterize  priors  which  ensure  frequentist  validity  of  credible  regions  based  on 
the  highest  posterior  density  (HPD).  This  work  could  readily  be  applied  to  J0rgensen 
dispersion  models. 
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Establish  general  regression  propriety  results  for  J0rgensen  dispersion 
models. 

In  the  case  of  an  unknown  A  the  sufficient  condition  given  in  Theorem  4.2.2  is  not 
very  satisfactory.  Much  work  could  be  done  here  to  simplify  this  condition  as  in  the 
case  where  we  can  assume  A  to  be  known.  General  propriety  results  can  be  established 
for  the  gamma,  Student's  t,  Fisher-von  Mises,  and  inverse  Gaussian  distributions. 


APPENDIX  A 
PROGRAMS  FOR  LOGNORMAL 

Estimating  P{fj,,  A) (a;  /i)  for  the  Lognormal  Distribution 

Description:  Computes  Frequentist  coverage  probabilities 
Date:  Tuesday,  Sept  26,  1995 
Dir:  lognormal 
File:  lnjeff_mu 

Description:  Computes  Frequentist  coverage  probabilities 
Array  I  Description 


Y        I  The    data  set  of  sums  of  logarithms  of  observations. 

X       I  The    data  set  of  sums  of  squared  logarithms  of  observations. 

readlib (write) : 
PROCEDURES 

mucheck : =proc (n , il , i2) 

local  st , i ,sy , isy ,f 1 ,f 2 : 
global  Y,X, count l,count2: 
St :=time() : 
count 1 :=0: 
count2 : =0 : 
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for  i  from  il  to  i2  do 
im:=modp(i , 10) : 
if  im=0  then 

open(convert(cat(istatus, ' ' ,il, ' ' ,i2) .string)) : 

print ('  MUCHECK  i=',i); 
close (convert (cat (istatus , ' ' ,il, ' ' ,i2) .string)) : 
fi: 

printC  MUCHECK  i='  ,i) ;  , 

sly  :=Y[i]:  .  ^  ,  . 

slysq:=X[i] : 

print ('sly=' .sly) ;  ^  ' 

ddl:=  (Int  ((slysq  -  2*u*sly  +  n* (u"2) ) " (-n/2) , 
'u=0. .infinity)  ) ; 
ddl:=evalf (ddl) : 
dd2:=  (Int  ((slysq  -  2*u*sly  +  n* (u''2) ) " (-n/2) , 
u=-l*inf inity . . 0)  ); 
dd2:=evalf (dd2) : 

dd:=evalf (ddl  +  dd2) ; 
print (' dd= ' ,dd) ; 

c:=evalf (1/dd); 
print ('  c=',  c) ; 

fl:=evalf (Int ((slysq  -  2*u*sly  +  n*(u"2))"(-n/2) , 

u=-l*inf inity. .0)  ); 
printC  fl=',fl); 
f2:=evalf (c*fl) ; 
if  f2<.05  then 

countl :=countl+l : 


94 

fi: 

if  f2<.95  then 

count2 : =count2+l : 

fi: 

od: 

print ( 'time=' ,-st+time() ) : 
open(convert(cat(countf ile, ' ' ,il, ' ' ,i2) .string)) : 

print (' count 1=' , count 1, 'count2=' ,count2) ; 
close (convert (cat (countf ile, ' ' ,il, ' ' ,i2) .string)) : 
close (countf ile) : 

print (' count 1=' . count 1. 'count2=' .count2) ; 

RETURN (countl. count 2) : 

end: 

Estimating  F(/i,  A) (a;  A)  for  the  Lognormal  Distribution 

Description:  Computes  Frequentist  coverage  probabilities 
Date:  Tuesday.  Sept  26,  1995 
Dir :  lognormal 
File:  Injeff .lambda 

Description:  Computes  Frequentist  coverage  probabilities 
ARRAYS 

Array  I  Description 

Y        I  The    data  set  of  sums  of  logarithms  of  observations. 

X        I  The    data  set  of  sums  of  squared  logarithms  of  observations. 

readlib(write) : 
PROCEDURES 


mucheck : =proc (n , i 1 , i2) 

local  st,i,sy,isy,f l,f2: 
global  Y,X, count l,coimt2: 
st:=time() : 
count 1 :=0: 
count2:=0: 

for  i  from  il  to  i2  do 
iin:=modp(i,  10) : 
if  im=0  then 

open (convert (cat (istatus,  "  ,il, ' ' ,i2) .string)) : 

print ('  MUCHECK  i=',i); 
close (convert (cat (istatus, ' ' ,il, ' ' ,i2) .string)) : 
fi: 

print ('  MUCHECK  i='.i); 
sly  :=Y[i]: 
slysq:=X[i] : 
print ('sly=' ,sly) ; 

g:=(l/(2-(n/2)*GAMMA(n/2))  *(slysq  -  ( (1/n) * (sly-2) ) ) " (n/2) ) ; 
f:=    l-((n/2)-l)  *  exp(  -(l/2)*(slysq  -  (1/n) *sly-2) ) ; 
f2:=evalf (g*Int(f ,1=0. .1)) ; 
if  f2<.05  then 

count2 : =count2+l : 

fi: 

od: 

print ('time=' ,-st+time()) : 
open(convert(cat(countf ile, ' ' ,il, ' ' ,i2) , string)) : 
print (' count 1=' , count 1, 'count2=' ,count2) ; 


close (convert (cat (countfile, ' ' ,il, ' ' ,i2) .string)) 
close(countf ile) : 

print ( 'count  1='  .countl , ' coiint2='  ,count2) ; 

RETURN (count 1 , count2) : 

end: 


APPENDIX  B 
PROGRAMS  FOR  INVERSE  GAUSSIAN 

Estimating  F(/i,  A) (a;  fx)  for  the  Inverse  Gaussian  Distribution 
Description:  Computes  Frequentist  coverage  probabilities 
Date:  Tuesday,  Sept  19,  1995 
Dir:  simul 
File:  iglOm 

Description:  Computes  Frequentist  coverage  probabilities 
ARRAYS 

Array  I  Description 


Y        I  The    data  set  of  sums.  f 

T 

X       I  The    data  set  of  sums  of  inverses. 

readlib (write) : 
PROCEDURES 

mucheck : =proc (n , il , i2) 

local  st,i,sy,isy,f l,f2: 
global  Y,X, count l,count2: 
st : =time  0 : 
countl:=0: 
count2 : =0 : 

I 
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for  i  from  il  to  12  do 
im:=modp(i , 10) : 
if  im=0  then 

open(convert(cat(istatus, ' ' ' ' ,i2) .string)) : 

print ('  MUCHECK  i=',i); 
close (convert (cat (istatus ,' ' ' , 12) , string) ) : 
fi: 

printC  MUCHECK  i=',i); 

sy:=Y[i]:  ,  •     ..  "  I        .  - 

isy:=X[i]: 

print  ('sy='  ,sy) ;  .  ■  .  r 

dd:=(Int(2*(  sy*(z-4)/2  -  n*(z-2)  +  isy/2  ) " (-(n+l)/2) , 
z=0 . . infinity) ) ; 
dd:=evalf (dd) : 
print ('dd=' ,dd) ; 
c:=evalf (1/dd) ; 
print ('  c=',  c) ; 

fl:=evalf  (Int(2*(  sy*(z-4)/2  -  n*(z'-2)  +  isy/2) " (-(n+l)/2) , 

z=sqrt (2) . . infinity) ) ; 
printC  fl=',fl); 
f2:=evalf (c*fl) ; 
if  f2<.05  then 

count 1 : =count 1+1 : 

fi: 

if  f2<.95  then 

coimt2 : =count2+l : 

fi: 


od: 

print ('time=' ,-st+time()) : 
open(convert(cat(countf ile, ' ' ,il, ' ' ,i2) .string)) : 

print (' count 1=' , count 1, 'count2=' ,count2) ; 
close (convert (cat (countf ile, ' ' ,il, ' ' ,i2) .string)) 
close (countf ile) : 

print (' count 1=' . count 1. 'count2=' ,count2) ; 

RETURN ( c  ount 1 . c  ount  2 ) : 

end: 


APPENDIX  C 
PROGRAMS  FOR  FISHER- VON  MISES 


Estimating  P(/x,  A)  (a;  //)  for  the  Fisher- Von  Mises  Distribution 
Description:  Computes  Frequentist  coverage  probabilities 
Date:  Jan  4,  1996 
Dir:  vonmises 
File:  guts 

Description:  Computes  denominator 
ARRAYS 
Array  I  Description 


C  I  The  data  set  of  sum  of  cosines  of  observations. 
S       I  The    data  set  of  sum  of  sines  of  observations. 

readlib (write) : 
readlib ( ' evalf /int ' ) : 
PROCEDURES 

mucheck : =proc (n , il , i2) 
local  st,i,c,s,r,f l,f2: 
global  C,S: 

for  i  from  il  to  12  do 

print ('  MUCHECK  i=',i); 
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St :=time() : 
c:=C[i]  : 
s:=S[i]  : 

r:=sqrt((c"2)  +(s~2)): 
t :=arctan(s,c) : 
f :=sqrt(abs(k-BesselI(l,k)/BesselI(0,k) 
-  k*(BesselI(l,k)/BesselI(0,k))~2  ))* 
sqrt(BesselI(l,k)/BesselI(0,k)) : 
g:=  BesselKO,  (r* (k) ) ) * (1/BesselI (0 ,  ( (k) ) ) "n) : 
dd:=lnt(f*g,k=0. .15) ; 

dd:=evalf (dd) : 

cc:=evalf (l/(2*Pi*dd)) ; 
appendto(cc_file) : 

lprint(i,cc,r) : 
appendto (terminal) : 
fi: 
od: 

end: 

Description:  Computes  Frequentist  coverage  probabilities 
Date:  Oct  19,  1995 
Dir:  vonMises 
File:  vmref 

Description:  Computes  numerator 
PROCEDURES 

mucheck : =proc (n , c , s) 


local  st,i,r,f l,ed: 
st:=time() : 
r:=sqrt((c"2)  +(s"2)): 
t:=arctan(s,c) : 
f : =sqrt (abs (k-Bessell (1 ,k) /Bessell (0 ,k) 
-  k*(BesselI(l,k)/BesselI(0,k))-2  ))* 
sqrt (Bessell (1 ,k) /Bessell (0 , k) ) : 
h:=  exp(k*r*cos(t-m))*(l/BesselI(0, (k))"n) ; 
fl:='evalf/int' (Int(f*h,m=-Pi. .0) ,k=0. .4,4) ; 
ed:=time() : 
tiin:=ed  -  st: 
appendto(jf 11) : 

lprint(c,s,f l,tim) : 
appendto (terminal) : 
end : 

mucheck(5,c,s)  : 
quit 
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